Skip to content
Snippets Groups Projects
Commit 9b2c877b authored by Rishabh Bhardwaj's avatar Rishabh Bhardwaj Committed by Sean Owen
Browse files

[SPARK-21039][SPARK CORE] Use treeAggregate instead of aggregate in DataFrame.stat.bloomFilter

## What changes were proposed in this pull request?
To use treeAggregate instead of aggregate in DataFrame.stat.bloomFilter to parallelize the operation of merging the bloom filters
(Please fill in changes proposed in this fix)

## How was this patch tested?
unit tests passed
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Rishabh Bhardwaj <rbnext29@gmail.com>
Author: Rishabh Bhardwaj <admin@rishabh.local>
Author: Rishabh Bhardwaj <r0b00ko@rishabh.Dlink>
Author: Rishabh Bhardwaj <admin@Admins-MacBook-Pro.local>
Author: Rishabh Bhardwaj <r0b00ko@rishabh.local>

Closes #18263 from rishabhbhardwaj/SPARK-21039.
parent 2aaed0a4
No related branches found
No related tags found
No related merge requests found
......@@ -551,7 +551,7 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
)
}
singleCol.queryExecution.toRdd.aggregate(zero)(
singleCol.queryExecution.toRdd.treeAggregate(zero)(
(filter: BloomFilter, row: InternalRow) => {
updater(filter, row)
filter
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment