-
- Downloads
[SPARK-19601][SQL] Fix CollapseRepartition rule to preserve shuffle-enabled Repartition
### What changes were proposed in this pull request? Observed by felixcheung in https://github.com/apache/spark/pull/16739, when users use the shuffle-enabled `repartition` API, they expect the partition they got should be the exact number they provided, even if they call shuffle-disabled `coalesce` later. Currently, `CollapseRepartition` rule does not consider whether shuffle is enabled or not. Thus, we got the following unexpected result. ```Scala val df = spark.range(0, 10000, 1, 5) val df2 = df.repartition(10) assert(df2.coalesce(13).rdd.getNumPartitions == 5) assert(df2.coalesce(7).rdd.getNumPartitions == 5) assert(df2.coalesce(3).rdd.getNumPartitions == 3) ``` This PR is to fix the issue. We preserve shuffle-enabled Repartition. ### How was this patch tested? Added a test case Author: Xiao Li <gatorsmile@gmail.com> Closes #16933 from gatorsmile/CollapseRepartition.
Showing
- R/pkg/inst/tests/testthat/test_sparkSQL.R 2 additions, 2 deletionsR/pkg/inst/tests/testthat/test_sparkSQL.R
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala 3 additions, 0 deletions...ain/scala/org/apache/spark/sql/catalyst/dsl/package.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 14 additions, 18 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala 12 additions, 4 deletions...rk/sql/catalyst/plans/logical/basicLogicalOperators.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala 137 additions, 16 deletions...ark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 5 additions, 5 deletionssql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala 5 additions, 4 deletions...t/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Loading
Please register or sign in to comment