[SPARK-23203][SQL] DataSourceV2: Use immutable logical plans.
## What changes were proposed in this pull request? SPARK-23203: DataSourceV2 should use immutable catalyst trees instead of wrapping a mutable DataSourceV2Reader. This commit updates DataSourceV2Relation and consolidates much of the DataSourceV2 API requirements for the read path in it. Instead of wrapping a reader that changes, the relation lazily produces a reader from its configuration. This commit also updates the predicate and projection push-down. Instead of the implementation from SPARK-22197, this reuses the rule matching from the Hive and DataSource read paths (using `PhysicalOperation`) and copies most of the implementation of `SparkPlanner.pruneFilterProject`, with updates for DataSourceV2. By reusing the implementation from other read paths, this should have fewer regressions from other read paths and is less code to maintain. The new push-down rules also supports the following edge cases: * The output of DataSourceV2Relation should be what is returned by the reader, in case the reader can only partially satisfy the requested schema projection * The requested projection passed to the DataSourceV2Reader should include filter columns * The push-down rule may be run more than once if filters are not pushed through projections ## How was this patch tested? Existing push-down and read tests. Author: Ryan Blue <blue@apache.org> Closes #20387 from rdblue/SPARK-22386-push-down-immutable-trees.
Showing
- external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala 4 additions, 15 deletions...pache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala
- external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousTest.scala 2 additions, 2 deletions...a/org/apache/spark/sql/kafka010/KafkaContinuousTest.scala
- external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala 2 additions, 2 deletions...pache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 9 additions, 32 deletions...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala 200 additions, 12 deletions...k/sql/execution/datasources/v2/DataSourceV2Relation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala 5 additions, 2 deletions...k/sql/execution/datasources/v2/DataSourceV2Strategy.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala 42 additions, 117 deletions...cution/datasources/v2/PushDownOperatorsToDataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala 1 addition, 1 deletion.../execution/streaming/continuous/ContinuousExecution.scala
- sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 1 addition, 1 deletion...a/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala 3 additions, 3 deletions...est/scala/org/apache/spark/sql/streaming/StreamTest.scala
Loading
Please register or sign in to comment