-
- Downloads
[SPARK-12700] [SQL] embed condition into SMJ and BroadcastHashJoin
Currently SortMergeJoin and BroadcastHashJoin do not support condition, the need a followed Filter for that, the result projection to generate UnsafeRow could be very expensive if they generate lots of rows and could be filtered mostly by condition. This PR brings the support of condition for SortMergeJoin and BroadcastHashJoin, just like other outer joins do. This could improve the performance of Q72 by 7x (from 120s to 16.5s). Author: Davies Liu <davies@databricks.com> Closes #10653 from davies/filter_join.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 7 additions, 17 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala 1 addition, 0 deletions.../apache/spark/sql/execution/joins/BroadcastHashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala 49 additions, 32 deletions...scala/org/apache/spark/sql/execution/joins/HashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala 4 additions, 1 deletion.../org/apache/spark/sql/execution/joins/HashOuterJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala 31 additions, 15 deletions.../org/apache/spark/sql/execution/joins/SortMergeJoin.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala 4 additions, 7 deletions...org/apache/spark/sql/execution/joins/InnerJoinSuite.scala
Please register or sign in to comment