Skip to content
Snippets Groups Projects
  • Michael Armbrust's avatar
    3abd0c1c
    [SPARK-2406][SQL] Initial support for using ParquetTableScan to read HiveMetaStore tables. · 3abd0c1c
    Michael Armbrust authored
    This PR adds an experimental flag `spark.sql.hive.convertMetastoreParquet` that when true causes the planner to detects tables that use Hive's Parquet SerDe and instead plans them using Spark SQL's native `ParquetTableScan`.
    
    Author: Michael Armbrust <michael@databricks.com>
    Author: Yin Huai <huai@cse.ohio-state.edu>
    
    Closes #1819 from marmbrus/parquetMetastore and squashes the following commits:
    
    1620079 [Michael Armbrust] Revert "remove hive parquet bundle"
    cc30430 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore
    4f3d54f [Michael Armbrust] fix style
    41ebc5f [Michael Armbrust] remove hive parquet bundle
    a43e0da [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore
    4c4dc19 [Michael Armbrust] Fix bug with tree splicing.
    ebb267e [Michael Armbrust] include parquet hive to tests pass (Remove this later).
    c0d9b72 [Michael Armbrust] Avoid creating a HadoopRDD per partition.  Add dirty hacks to retrieve partition values from the InputSplit.
    8cdc93c [Michael Armbrust] Merge pull request #8 from yhuai/parquetMetastore
    a0baec7 [Yin Huai] Partitioning columns can be resolved.
    1161338 [Michael Armbrust] Add a test to make sure conversion is actually happening
    212d5cd [Michael Armbrust] Initial support for using ParquetTableScan to read HiveMetaStore tables.
    3abd0c1c
    History
    [SPARK-2406][SQL] Initial support for using ParquetTableScan to read HiveMetaStore tables.
    Michael Armbrust authored
    This PR adds an experimental flag `spark.sql.hive.convertMetastoreParquet` that when true causes the planner to detects tables that use Hive's Parquet SerDe and instead plans them using Spark SQL's native `ParquetTableScan`.
    
    Author: Michael Armbrust <michael@databricks.com>
    Author: Yin Huai <huai@cse.ohio-state.edu>
    
    Closes #1819 from marmbrus/parquetMetastore and squashes the following commits:
    
    1620079 [Michael Armbrust] Revert "remove hive parquet bundle"
    cc30430 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore
    4f3d54f [Michael Armbrust] fix style
    41ebc5f [Michael Armbrust] remove hive parquet bundle
    a43e0da [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore
    4c4dc19 [Michael Armbrust] Fix bug with tree splicing.
    ebb267e [Michael Armbrust] include parquet hive to tests pass (Remove this later).
    c0d9b72 [Michael Armbrust] Avoid creating a HadoopRDD per partition.  Add dirty hacks to retrieve partition values from the InputSplit.
    8cdc93c [Michael Armbrust] Merge pull request #8 from yhuai/parquetMetastore
    a0baec7 [Yin Huai] Partitioning columns can be resolved.
    1161338 [Michael Armbrust] Add a test to make sure conversion is actually happening
    212d5cd [Michael Armbrust] Initial support for using ParquetTableScan to read HiveMetaStore tables.