-
- Downloads
[SPARK-9372] [SQL] Filter nulls in join keys
This PR adds an optimization rule, `FilterNullsInJoinKey`, to add `Filter` before join operators to filter out rows having null values for join keys. This optimization is guarded by a new SQL conf, `spark.sql.advancedOptimization`. The code in this PR was authored by yhuai; I'm opening this PR to factor out this change from #7685, a larger pull request which contains two other optimizations. Author: Yin Huai <yhuai@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #7768 from JoshRosen/filter-nulls-in-join-key and squashes the following commits: c02fc3f [Yin Huai] Address Josh's comments. 0a8e096 [Yin Huai] Update comments. ea7d5a6 [Yin Huai] Make sure we do not keep adding filters. be88760 [Yin Huai] Make it clear that FilterNullsInJoinKeySuite.scala is used to test FilterNullsInJoinKey. 8bb39ad [Yin Huai] Fix non-deterministic tests. 303236b [Josh Rosen] Revert changes that are unrelated to null join key filtering 40eeece [Josh Rosen] Merge remote-tracking branch 'origin/master' into filter-nulls-in-join-key c57a954 [Yin Huai] Bug fix. d3d2e64 [Yin Huai] First round of cleanup. f9516b0 [Yin Huai] Style c6667e7 [Yin Huai] Add PartitioningCollection. e616d3b [Yin Huai] wip 7c2d2d8 [Yin Huai] Bug fix and refactoring. 69bb072 [Yin Huai] Introduce NullSafeHashPartitioning and NullUnsafePartitioning. d5b84c3 [Yin Huai] Do not add unnessary filters. 2201129 [Yin Huai] Filter out rows that will not be joined in equal joins early.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullFunctions.scala 46 additions, 2 deletions...apache/spark/sql/catalyst/expressions/nullFunctions.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 42 additions, 22 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala 31 additions, 1 deletion...che/spark/sql/catalyst/plans/logical/basicOperators.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala 3 additions, 1 deletion...spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala 1 addition, 2 deletions...e/spark/sql/catalyst/expressions/MathFunctionsSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullFunctionsSuite.scala 42 additions, 7 deletions...e/spark/sql/catalyst/expressions/NullFunctionsSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala 1 addition, 1 deletion...ain/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
- sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala 6 additions, 0 deletionssql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
- sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 4 additions, 1 deletion...core/src/main/scala/org/apache/spark/sql/SQLContext.scala
- sql/core/src/main/scala/org/apache/spark/sql/optimizer/extendedOperatorOptimizations.scala 160 additions, 0 deletions...e/spark/sql/optimizer/extendedOperatorOptimizations.scala
- sql/core/src/test/scala/org/apache/spark/sql/optimizer/FilterNullsInJoinKeySuite.scala 236 additions, 0 deletions...pache/spark/sql/optimizer/FilterNullsInJoinKeySuite.scala
Loading
Please register or sign in to comment