-
- Downloads
SPARK-1235: manage the DAGScheduler EventProcessActor with supervisor and...
SPARK-1235: manage the DAGScheduler EventProcessActor with supervisor and refactor the DAGScheduler with Akka https://spark-project.atlassian.net/browse/SPARK-1235 In the current implementation, the running job will hang if the DAGScheduler crashes for some reason (eventProcessActor throws exception in receive() ) The reason is that the actor will automatically restart when the exception is thrown during the running but is not captured properly (Akka behaviour), and the JobWaiters are still waiting there for the completion of the tasks In this patch, I refactored the DAGScheduler with Akka and manage the eventProcessActor with supervisor, so that upon the failure of a eventProcessActor, the supervisor will terminate the EventProcessActor and close the SparkContext thanks for @kayousterhout and @markhamstra to give the hints in JIRA Author: CodingCat <zhunansjtu@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Author: Nan Zhu <CodingCat@users.noreply.github.com> Closes #186 from CodingCat/SPARK-1235 and squashes the following commits: a7fb0ee [CodingCat] throw Exception on failure of creating DAG 124d82d [CodingCat] blocking the constructor until event actor is ready baf2d38 [CodingCat] fix the issue brought by non-blocking actorOf 35c886a [CodingCat] fix bug 82d08b3 [CodingCat] calling actorOf on system to ensure it is blocking 310a579 [CodingCat] style fix cd02d9a [Nan Zhu] small fix 561cfbc [CodingCat] recover doCheckpoint c048d0e [CodingCat] call submitWaitingStages for every event a9eea039 [CodingCat] address Matei's comments ac878ab [CodingCat] typo fix 5d1636a [CodingCat] re-trigger the test..... 9dfb033 [CodingCat] remove unnecessary changes a7a2a97 [CodingCat] add StageCancelled message fdf3b17 [CodingCat] just to retrigger the test...... 089bc2f [CodingCat] address andrew's comments 228f4b0 [CodingCat] address comments from Mark b68c1c7 [CodingCat] refactor DAGScheduler with Akka 810efd8 [Xiangrui Meng] akka solution
Showing
- core/src/main/scala/org/apache/spark/SparkContext.scala 12 additions, 8 deletionscore/src/main/scala/org/apache/spark/SparkContext.scala
- core/src/main/scala/org/apache/spark/rdd/RDD.scala 3 additions, 3 deletionscore/src/main/scala/org/apache/spark/rdd/RDD.scala
- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 236 additions, 183 deletions.../main/scala/org/apache/spark/scheduler/DAGScheduler.scala
- core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerEvent.scala 1 addition, 3 deletions.../scala/org/apache/spark/scheduler/DAGSchedulerEvent.scala
- core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala 1 addition, 1 deletion...ain/scala/org/apache/spark/scheduler/TaskSetManager.scala
- core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 36 additions, 22 deletions.../scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
- core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala 1 addition, 1 deletion...a/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
Loading
Please register or sign in to comment