-
- Downloads
Remove map-side combining from ShuffleMapTask.
This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.
Showing
- core/src/main/scala/spark/Dependency.scala 1 addition, 3 deletionscore/src/main/scala/spark/Dependency.scala
- core/src/main/scala/spark/PairRDDFunctions.scala 6 additions, 5 deletionscore/src/main/scala/spark/PairRDDFunctions.scala
- core/src/main/scala/spark/rdd/CoGroupedRDD.scala 7 additions, 10 deletionscore/src/main/scala/spark/rdd/CoGroupedRDD.scala
- core/src/main/scala/spark/rdd/ShuffledRDD.scala 5 additions, 10 deletionscore/src/main/scala/spark/rdd/ShuffledRDD.scala
- core/src/main/scala/spark/scheduler/DAGScheduler.scala 5 additions, 5 deletionscore/src/main/scala/spark/scheduler/DAGScheduler.scala
- core/src/main/scala/spark/scheduler/ShuffleMapTask.scala 12 additions, 31 deletionscore/src/main/scala/spark/scheduler/ShuffleMapTask.scala
- core/src/main/scala/spark/scheduler/Stage.scala 1 addition, 1 deletioncore/src/main/scala/spark/scheduler/Stage.scala
- core/src/test/scala/spark/ShuffleSuite.scala 0 additions, 29 deletionscore/src/test/scala/spark/ShuffleSuite.scala
Loading
Please register or sign in to comment