Skip to content
Snippets Groups Projects
Commit 2fd781d3 authored by Josh Rosen's avatar Josh Rosen
Browse files

Merge pull request #249 from ngbinh/partitionInJavaSortByKey

Expose numPartitions parameter in JavaPairRDD.sortByKey()

This change makes Java and Scala API on sortByKey() the same.
parents 97ac0601 0b494f7d
No related branches found
No related tags found
No related merge requests found
......@@ -588,6 +588,20 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])(implicit val kClassTag: ClassTag[K
fromRDD(new OrderedRDDFunctions[K, V, (K, V)](rdd).sortByKey(ascending))
}
/**
* Sort the RDD by key, so that each partition contains a sorted range of the elements. Calling
* `collect` or `save` on the resulting RDD will return or output an ordered list of records
* (in the `save` case, they will be written to multiple `part-X` files in the filesystem, in
* order of the keys).
*/
def sortByKey(comp: Comparator[K], ascending: Boolean, numPartitions: Int): JavaPairRDD[K, V] = {
class KeyOrdering(val a: K) extends Ordered[K] {
override def compare(b: K) = comp.compare(a, b)
}
implicit def toOrdered(x: K): Ordered[K] = new KeyOrdering(x)
fromRDD(new OrderedRDDFunctions[K, V, (K, V)](rdd).sortByKey(ascending, numPartitions))
}
/**
* Return an RDD with the keys of each tuple.
*/
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment