-
- Downloads
[SPARK-21031][SQL] Add `alterTableStats` to store spark's stats and let...
[SPARK-21031][SQL] Add `alterTableStats` to store spark's stats and let `alterTable` keep existing stats ## What changes were proposed in this pull request? Currently, hive's stats are read into `CatalogStatistics`, while spark's stats are also persisted through `CatalogStatistics`. As a result, hive's stats can be unexpectedly propagated into spark' stats. For example, for a catalog table, we read stats from hive, e.g. "totalSize" and put it into `CatalogStatistics`. Then, by using "ALTER TABLE" command, we will store the stats in `CatalogStatistics` into metastore as spark's stats (because we don't know whether it's from spark or not). But spark's stats should be only generated by "ANALYZE" command. This is unexpected from this command. Secondly, now that we have spark's stats in metastore, after inserting new data, although hive updated "totalSize" in metastore, we still cannot get the right `sizeInBytes` in `CatalogStatistics`, because we respect spark's stats (should not exist) over hive's stats. A running example is shown in [JIRA](https://issues.apache.org/jira/browse/SPARK-21031). To fix this, we add a new method `alterTableStats` to store spark's stats, and let `alterTable` keep existing stats. ## How was this patch tested? Added new tests. Author: Zhenhua Wang <wzh_zju@163.com> Closes #18248 from wzhfy/separateHiveStats.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala 2 additions, 0 deletions...g/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala 9 additions, 0 deletions...g/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala 13 additions, 0 deletions...rg/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala 10 additions, 1 deletion...che/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala 12 additions, 0 deletions...ache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala 1 addition, 1 deletion...he/spark/sql/execution/command/AnalyzeColumnCommand.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala 1 addition, 1 deletion...che/spark/sql/execution/command/AnalyzeTableCommand.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 39 additions, 29 deletions...scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala 45 additions, 35 deletions...est/scala/org/apache/spark/sql/hive/StatisticsSuite.scala
Loading
Please register or sign in to comment