[SPARK-54353][SQL] Make CollectMetricsExec.collectedMetrics thread-safe #53064
+39
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR makes the
CollectMetricsExec.collectedMetricsmethod thread-safe by adding proper synchronization. The changes include:Modified
CollectMetricsExec.collectedMetricsmethod: Added synchronization blocks around both theaccumulator.valueaccess and thetoRowConvertercall to prevent race conditions when multiple threads access the collected metrics concurrently.Added concurrency test: Added a new test case
SPARK-54353: concurrent CollectMetricsExec.collect()inDatasetSuitethat verifies the thread-safety by spawning multiple threads that concurrently accessCollectMetricsExec.collect().Removed the coarse-grained lock in QueryExecution.observedMetrics
The implementation ensures that:
Why are the changes needed?
The
collectedMetricsmethod inCollectMetricsExeccan be accessed concurrently by multiple threads, particularly when using theObservationAPI. The previous implementation had two thread-safety issues:accumulator.valuecall could be accessed by multiple threads simultaneouslytoRowConverter(anInternalRow => Rowconverter generated by code generation) is not thread-safe and can cause race conditions when accessed concurrentlyWithout proper synchronization, concurrent access could lead to:
This is especially problematic in scenarios where multiple threads query the execution plan or access observations simultaneously.
Does this PR introduce any user-facing change?
No. This is an internal bug fix that improves thread-safety. Users should not observe any behavioral changes except for the elimination of potential race conditions and incorrect results when accessing metrics concurrently.
How was this patch tested?
build/sbt "sql/testOnly *DatasetSuite -- -z SPARK-54353"Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.7.54