-
- Downloads
[SPARK-22160][SQL] Make sample points per partition (in range partitioner)...
[SPARK-22160][SQL] Make sample points per partition (in range partitioner) configurable and bump the default value up to 100 ## What changes were proposed in this pull request? Spark's RangePartitioner hard codes the number of sampling points per partition to be 20. This is sometimes too low. This ticket makes it configurable, via spark.sql.execution.rangeExchange.sampleSizePerPartition, and raises the default in Spark SQL to be 100. ## How was this patch tested? Added a pretty sophisticated test based on chi square test ... Author: Reynold Xin <rxin@databricks.com> Closes #19387 from rxin/SPARK-22160.
Showing
- core/src/main/scala/org/apache/spark/Partitioner.scala 13 additions, 2 deletionscore/src/main/scala/org/apache/spark/Partitioner.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 10 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala 6 additions, 1 deletion...he/spark/sql/execution/exchange/ShuffleExchangeExec.scala
- sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala 66 additions, 0 deletions...test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala
Loading
Please register or sign in to comment