-
- Downloads
Fixing SPARK-602: PythonPartitioner
Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
Showing
- core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala 7 additions, 3 deletions...scala/org/apache/spark/api/python/PythonPartitioner.scala
- core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 3 additions, 3 deletions...rc/main/scala/org/apache/spark/api/python/PythonRDD.scala
- core/src/main/scala/org/apache/spark/util/Utils.scala 13 additions, 0 deletionscore/src/main/scala/org/apache/spark/util/Utils.scala
- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 11 additions, 0 deletionscore/src/test/scala/org/apache/spark/util/UtilsSuite.scala
- python/pyspark/rdd.py 6 additions, 4 deletionspython/pyspark/rdd.py
- python/pyspark/serializers.py 4 additions, 0 deletionspython/pyspark/serializers.py
Loading
Please register or sign in to comment