-
- Downloads
Simplify checkpointing code and RDD class a little:
- RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).
Showing
- core/src/main/scala/spark/CacheManager.scala 3 additions, 3 deletionscore/src/main/scala/spark/CacheManager.scala
- core/src/main/scala/spark/PairRDDFunctions.scala 1 addition, 3 deletionscore/src/main/scala/spark/PairRDDFunctions.scala
- core/src/main/scala/spark/RDD.scala 73 additions, 57 deletionscore/src/main/scala/spark/RDD.scala
- core/src/main/scala/spark/RDDCheckpointData.scala 10 additions, 9 deletionscore/src/main/scala/spark/RDDCheckpointData.scala
- core/src/main/scala/spark/api/java/JavaRDDLike.scala 1 addition, 1 deletioncore/src/main/scala/spark/api/java/JavaRDDLike.scala
- core/src/main/scala/spark/rdd/CartesianRDD.scala 3 additions, 9 deletionscore/src/main/scala/spark/rdd/CartesianRDD.scala
- core/src/main/scala/spark/rdd/CheckpointRDD.scala 31 additions, 30 deletionscore/src/main/scala/spark/rdd/CheckpointRDD.scala
- core/src/main/scala/spark/rdd/CoalescedRDD.scala 4 additions, 10 deletionscore/src/main/scala/spark/rdd/CoalescedRDD.scala
- core/src/main/scala/spark/rdd/MappedRDD.scala 2 additions, 4 deletionscore/src/main/scala/spark/rdd/MappedRDD.scala
- core/src/main/scala/spark/rdd/PartitionPruningRDD.scala 4 additions, 9 deletionscore/src/main/scala/spark/rdd/PartitionPruningRDD.scala
- core/src/main/scala/spark/rdd/ShuffledRDD.scala 1 addition, 7 deletionscore/src/main/scala/spark/rdd/ShuffledRDD.scala
- core/src/main/scala/spark/rdd/UnionRDD.scala 4 additions, 10 deletionscore/src/main/scala/spark/rdd/UnionRDD.scala
- core/src/main/scala/spark/rdd/ZippedRDD.scala 1 addition, 6 deletionscore/src/main/scala/spark/rdd/ZippedRDD.scala
- core/src/main/scala/spark/util/MetadataCleaner.scala 2 additions, 2 deletionscore/src/main/scala/spark/util/MetadataCleaner.scala
- core/src/test/scala/spark/CheckpointSuite.scala 13 additions, 8 deletionscore/src/test/scala/spark/CheckpointSuite.scala
Loading
Please register or sign in to comment