Skip to content
Snippets Groups Projects
Commit 2fda84fe authored by Patrick Wendell's avatar Patrick Wendell
Browse files

Always use a shuffle

parent 08c1a42d
No related branches found
No related tags found
No related merge requests found
...@@ -268,22 +268,14 @@ abstract class RDD[T: ClassManifest]( ...@@ -268,22 +268,14 @@ abstract class RDD[T: ClassManifest](
/** /**
* Return a new RDD that has exactly numPartitions partitions. * Return a new RDD that has exactly numPartitions partitions.
* *
* Used to increase or decrease the level of parallelism in this RDD. By default, this will use * Used to increase or decrease the level of parallelism in this RDD. This will use
* a shuffle to redistribute data. If you are shrinking the RDD into fewer partitions, you can * a shuffle to redistribute data.
* set skipShuffle = false to avoid a shuffle. Skipping shuffles is not supported when
* increasing the number of partitions.
* *
* Similar to `coalesce`, but shuffles by default, allowing you to call this safely even * If you are decreasing the number of partitions in this RDD, consider using `coalesce`,
* if you don't know the number of partitions. * which can avoid performing a shuffle.
*/ */
def repartition(numPartitions: Int, skipShuffle: Boolean = false): RDD[T] = { def repartition(numPartitions: Int): RDD[T] = {
if (skipShuffle && numPartitions > this.partitions.size) { coalesce(numPartitions, true)
val msg = "repartition must grow %s from %s to %s partitions, cannot skip shuffle.".format(
this.name, this.partitions.size, numPartitions
)
throw new IllegalArgumentException(msg)
}
coalesce(numPartitions, !skipShuffle)
} }
/** /**
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment