-
- Downloads
[SPARK-19846][SQL] Add a flag to disable constraint propagation
## What changes were proposed in this pull request? Constraint propagation can be computation expensive and block the driver execution for long time. For example, the below benchmark needs 30mins. Compared with previous PRs #16998, #16785, this is a much simpler option: add a flag to disable constraint propagation. ### Benchmark Run the following codes locally. import org.apache.spark.ml.{Pipeline, PipelineStage} import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, VectorAssembler} import org.apache.spark.sql.internal.SQLConf spark.conf.set(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key, false) val df = (1 to 40).foldLeft(Seq((1, "foo"), (2, "bar"), (3, "baz")).toDF("id", "x0"))((df, i) => df.withColumn(s"x$i", $"x0")) val indexers = df.columns.tail.map(c => new StringIndexer() .setInputCol(c) .setOutputCol(s"${c}_indexed") .setHandleInvalid("skip")) val encoders = indexers.map(indexer => new OneHotEncoder() .setInputCol(indexer.getOutputCol) .setOutputCol(s"${indexer.getOutputCol}_encoded") .setDropLast(true)) val stages: Array[PipelineStage] = indexers ++ encoders val pipeline = new Pipeline().setStages(stages) val startTime = System.nanoTime pipeline.fit(df).transform(df).show val runningTime = System.nanoTime - startTime Before this patch: 1786001 ms ~= 30 mins After this patch: 26392 ms = less than half of a minute Related PRs: #16998, #16785. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #17186 from viirya/add-flag-disable-constraint-propagation.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SimpleCatalystConf.scala 2 additions, 1 deletion...la/org/apache/spark/sql/catalyst/SimpleCatalystConf.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 15 additions, 7 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 4 additions, 2 deletions...scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 11 additions, 0 deletions...scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 11 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala 3 additions, 2 deletions...alyst/optimizer/BinaryComparisonSimplificationSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala 3 additions, 2 deletions...k/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala 18 additions, 1 deletion...catalyst/optimizer/InferFiltersFromConstraintsSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OuterJoinEliminationSuite.scala 29 additions, 1 deletion...rk/sql/catalyst/optimizer/OuterJoinEliminationSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala 3 additions, 2 deletions.../sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PruneFiltersSuite.scala 39 additions, 1 deletion...ache/spark/sql/catalyst/optimizer/PruneFiltersSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationSuite.scala 2 additions, 1 deletion...ache/spark/sql/catalyst/optimizer/SetOperationSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala 18 additions, 0 deletions...spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
Loading
Please register or sign in to comment