-
- Downloads
[SPARK-12610][SQL] Left Anti Join
### What changes were proposed in this pull request? This PR adds support for `LEFT ANTI JOIN` to Spark SQL. A `LEFT ANTI JOIN` is the exact opposite of a `LEFT SEMI JOIN` and can be used to identify rows in one dataset that are not in another dataset. Note that `nulls` on the left side of the join cannot match a row on the right hand side of the join; the result is that left anti join will always select a row with a `null` in one or more of its keys. We currently add support for the following SQL join syntax: SELECT * FROM tbl1 A LEFT ANTI JOIN tbl2 B ON A.Id = B.Id Or using a dataframe: tbl1.as("a").join(tbl2.as("b"), $"a.id" === $"b.id", "left_anti) This PR provides serves as the basis for implementing `NOT EXISTS` and `NOT IN (...)` correlated sub-queries. It would also serve as good basis for implementing an more efficient `EXCEPT` operator. The PR has been (losely) based on PR's by both davies (https://github.com/apache/spark/pull/10706) and chenghao-intel (https://github.com/apache/spark/pull/10563); credit should be given where credit is due. This PR adds supports for `LEFT ANTI JOIN` to `BroadcastHashJoin` (including codegeneration), `ShuffledHashJoin` and `BroadcastNestedLoopJoin`. ### How was this patch tested? Added tests to `JoinSuite` and ported `ExistenceJoinSuite` from https://github.com/apache/spark/pull/10563. cc davies chenghao-intel rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #12214 from hvanhovell/SPARK-12610.
Showing
- sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 2 additions, 0 deletions...in/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala 1 addition, 1 deletion...ala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 4 additions, 4 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala 1 addition, 0 deletions...ala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala 15 additions, 2 deletions...scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala 2 additions, 2 deletions...che/spark/sql/catalyst/plans/logical/basicOperators.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala 4 additions, 1 deletion...rg/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanner.scala 1 addition, 1 deletion...n/scala/org/apache/spark/sql/execution/SparkPlanner.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 6 additions, 5 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala 71 additions, 28 deletions.../apache/spark/sql/execution/joins/BroadcastHashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoin.scala 35 additions, 22 deletions...e/spark/sql/execution/joins/BroadcastNestedLoopJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala 17 additions, 1 deletion...scala/org/apache/spark/sql/execution/joins/HashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala 1 addition, 0 deletions...g/apache/spark/sql/execution/joins/ShuffledHashJoin.scala
- sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala 18 additions, 18 deletionssql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/joins/ExistenceJoinSuite.scala 52 additions, 22 deletions...apache/spark/sql/execution/joins/ExistenceJoinSuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala 1 addition, 1 deletion...in/scala/org/apache/spark/sql/hive/HiveSessionState.scala
Loading
Please register or sign in to comment