Skip to content
Snippets Groups Projects
Commit 323806e6 authored by Reynold Xin's avatar Reynold Xin
Browse files

[SPARK-22160][SQL] Make sample points per partition (in range partitioner)...

[SPARK-22160][SQL] Make sample points per partition (in range partitioner) configurable and bump the default value up to 100

## What changes were proposed in this pull request?
Spark's RangePartitioner hard codes the number of sampling points per partition to be 20. This is sometimes too low. This ticket makes it configurable, via spark.sql.execution.rangeExchange.sampleSizePerPartition, and raises the default in Spark SQL to be 100.

## How was this patch tested?
Added a pretty sophisticated test based on chi square test ...

Author: Reynold Xin <rxin@databricks.com>

Closes #19387 from rxin/SPARK-22160.
parent d29d1e87
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment