-
- Downloads
[SPARK-22537][CORE] Aggregation of map output statistics on driver faces single point bottleneck
## What changes were proposed in this pull request? In adaptive execution, the map output statistics of all mappers will be aggregated after previous stage is successfully executed. Driver takes the aggregation job while it will get slow when the number of `mapper * shuffle partitions` is large, since it only uses single thread to compute. This PR uses multi-thread to deal with this single point bottleneck. ## How was this patch tested? Test cases are in `MapOutputTrackerSuite.scala` Author: GuoChenzhao <chenzhao.guo@intel.com> Author: gczsjdy <gczsjdy1994@gmail.com> Closes #19763 from gczsjdy/single_point_mapstatistics.
Showing
- core/src/main/scala/org/apache/spark/MapOutputTracker.scala 57 additions, 3 deletionscore/src/main/scala/org/apache/spark/MapOutputTracker.scala
- core/src/main/scala/org/apache/spark/internal/config/package.scala 11 additions, 0 deletions...main/scala/org/apache/spark/internal/config/package.scala
- core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala 23 additions, 0 deletions...c/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
Loading
Please register or sign in to comment