-
- Downloads
[SPARK-2883] [SQL] ORC data source for Spark SQL
This PR updates PR #6135 authored by zhzhan from Hortonworks. ---- This PR implements a Spark SQL data source for accessing ORC files. > **NOTE** > > Although ORC is now an Apache TLP, the codebase is still tightly coupled with Hive. That's why the new ORC data source is under `org.apache.spark.sql.hive` package, and must be used with `HiveContext`. However, it doesn't require existing Hive installation to access ORC files. 1. Saving/loading ORC files without contacting Hive metastore 1. Support for complex data types (i.e. array, map, and struct) 1. Aware of common optimizations provided by Spark SQL: - Column pruning - Partitioning pruning - Filter push-down 1. Schema evolution support 1. Hive metastore table conversion This PR also include initial work done by scwf from Huawei (PR #3753). Author: Zhan Zhang <zhazhan@gmail.com> Author: Cheng Lian <lian@databricks.com> Closes #6194 from liancheng/polishing-orc and squashes the following commits: 55ecd96 [Cheng Lian] Reorganizes ORC test suites d4afeed [Cheng Lian] Addresses comments 21ada22 [Cheng Lian] Adds @since and @Experimental annotations 128bd3b [Cheng Lian] ORC filter bug fix d734496 [Cheng Lian] Polishes the ORC data source 2650a42 [Zhan Zhang] resolve review comments 3c9038e [Zhan Zhang] resolve review comments 7b3c7c5 [Zhan Zhang] save mode fix f95abfd [Zhan Zhang] reuse test suite 7cc2c64 [Zhan Zhang] predicate fix 4e61c16 [Zhan Zhang] minor change 305418c [Zhan Zhang] orc data source support
Showing
- sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala 6 additions, 1 deletionsql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTest.scala 4 additions, 57 deletions...main/scala/org/apache/spark/sql/parquet/ParquetTest.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala 12 additions, 6 deletions...ore/src/main/scala/org/apache/spark/sql/sources/ddl.scala
- sql/core/src/main/scala/org/apache/spark/sql/test/SQLTestUtils.scala 81 additions, 0 deletions...c/main/scala/org/apache/spark/sql/test/SQLTestUtils.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala 32 additions, 8 deletions...main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala 69 additions, 0 deletions...scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala 144 additions, 0 deletions...main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala 290 additions, 0 deletions...ain/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala 59 additions, 0 deletions.../apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcPartitionDiscoverySuite.scala 256 additions, 0 deletions...pache/spark/sql/hive/orc/OrcPartitionDiscoverySuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala 294 additions, 0 deletions...t/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala 146 additions, 0 deletions.../scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcTest.scala 82 additions, 0 deletions...rc/test/scala/org/apache/spark/sql/hive/orc/OrcTest.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala 2 additions, 4 deletions...org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
Loading
Please register or sign in to comment