-
- Downloads
[SPARK-2617] Correct doc and usages of preservesPartitioning
The name `preservesPartitioning` is ambiguous: 1) preserves the indices of partitions, 2) preserves the partitioner. The latter is correct and `preservesPartitioning` should really be called `preservesPartitioner` to avoid confusion. Unfortunately, this is already part of the API and we cannot change. We should be clear in the doc and fix wrong usages. This PR 1. adds notes in `maPartitions*`, 2. makes `RDD.sample` preserve partitioner, 3. changes `preservesPartitioning` to false in `RDD.zip` because the keys of the first RDD are no longer the keys of the zipped RDD, 4. fixes some wrong usages in MLlib. Author: Xiangrui Meng <meng@databricks.com> Closes #1526 from mengxr/preserve-partitioner and squashes the following commits: b361e65 [Xiangrui Meng] update doc based on pwendell's comments 3b1ba19 [Xiangrui Meng] update doc 357575c [Xiangrui Meng] fix unit test 20b4816 [Xiangrui Meng] Merge branch 'master' into preserve-partitioner d1caa65 [Xiangrui Meng] add doc to explain preservesPartitioning fix wrong usage of preservesPartitioning make sample preserse partitioning
Showing
- core/src/main/scala/org/apache/spark/rdd/PartitionwiseSampledRDD.scala 4 additions, 0 deletions.../scala/org/apache/spark/rdd/PartitionwiseSampledRDD.scala
- core/src/main/scala/org/apache/spark/rdd/RDD.scala 13 additions, 4 deletionscore/src/main/scala/org/apache/spark/rdd/RDD.scala
- core/src/test/scala/org/apache/spark/rdd/PartitionwiseSampledRDDSuite.scala 2 additions, 2 deletions...a/org/apache/spark/rdd/PartitionwiseSampledRDDSuite.scala
- core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala 9 additions, 0 deletionscore/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
- mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala 2 additions, 2 deletions.../spark/mllib/evaluation/BinaryClassificationMetrics.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 4 additions, 4 deletions...org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
- mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala 1 addition, 1 deletion...ain/scala/org/apache/spark/mllib/recommendation/ALS.scala
- mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 2 additions, 2 deletions.../src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
Loading
Please register or sign in to comment