-
- Downloads
Merge pull request #33 from AndreSchumacher/pyspark_partition_key_change
Fixing SPARK-602: PythonPartitioner Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
No related branches found
No related tags found
Showing
- core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala 7 additions, 3 deletions...scala/org/apache/spark/api/python/PythonPartitioner.scala
- core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 3 additions, 3 deletions...rc/main/scala/org/apache/spark/api/python/PythonRDD.scala
- core/src/main/scala/org/apache/spark/util/Utils.scala 13 additions, 0 deletionscore/src/main/scala/org/apache/spark/util/Utils.scala
- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 11 additions, 0 deletionscore/src/test/scala/org/apache/spark/util/UtilsSuite.scala
- python/pyspark/rdd.py 6 additions, 4 deletionspython/pyspark/rdd.py
- python/pyspark/serializers.py 4 additions, 0 deletionspython/pyspark/serializers.py
Please register or sign in to comment