-
- Downloads
Merge pull request #41 from pwendell/shuffle-benchmark
Provide Instrumentation for Shuffle Write Performance Shuffle write performance can have a major impact on the performance of jobs. This patch adds a few pieces of instrumentation related to shuffle writes. They are: 1. A listing of the time spent performing blocking writes for each task. This is implemented by keeping track of the aggregate delay seen by many individual writes. 2. An undocumented option `spark.shuffle.sync` which forces shuffle data to sync to disk. This is necessary for measuring shuffle performance in the absence of the OS buffer cache. 3. An internal utility which micro-benchmarks write throughput for simulated shuffle outputs. I'm going to do some performance testing on this to see whether these small timing calls add overhead. From a feature perspective, however, I consider this complete. Any feedback is appreciated.
No related branches found
No related tags found
Showing
- core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 5 additions, 0 deletions...rc/main/scala/org/apache/spark/executor/TaskMetrics.scala
- core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala 3 additions, 0 deletions...ain/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
- core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala 5 additions, 0 deletions...in/scala/org/apache/spark/storage/BlockObjectWriter.scala
- core/src/main/scala/org/apache/spark/storage/DiskStore.scala 41 additions, 3 deletionscore/src/main/scala/org/apache/spark/storage/DiskStore.scala
- core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 3 additions, 1 deletioncore/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
- core/src/main/scala/spark/storage/StoragePerfTester.scala 84 additions, 0 deletionscore/src/main/scala/spark/storage/StoragePerfTester.scala
Loading
Please register or sign in to comment