-
- Downloads
[SPARK-8501] [SQL] Avoids reading schema from empty ORC files
ORC writes empty schema (`struct<>`) to ORC files containing zero rows. This is OK for Hive since the table schema is managed by the metastore. But it causes trouble when reading raw ORC files via Spark SQL since we have to discover the schema from the files. Notice that the ORC data source always avoids writing empty ORC files, but it's still problematic when reading Hive tables which contain empty part-files. Author: Cheng Lian <lian@databricks.com> Closes #7199 from liancheng/spark-8501 and squashes the following commits: bb8cd95 [Cheng Lian] Addresses comments a290221 [Cheng Lian] Avoids reading schema from empty ORC files
Showing
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala 50 additions, 10 deletions...scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala 26 additions, 18 deletions...ain/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala 48 additions, 7 deletions...t/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala 11 additions, 17 deletions.../scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala
Loading
Please register or sign in to comment