-
- Downloads
[SPARK-13930] [SQL] Apply fast serialization on collect limit operator
## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-13930 Recently the fast serialization has been introduced to collecting DataFrame/Dataset (#11664). The same technology can be used on collect limit operator too. ## How was this patch tested? Add a benchmark for collect limit to `BenchmarkWholeStageCodegen`. Without this patch: model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) collect limit: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- collect limit 1 million 3413 / 3768 0.3 3255.0 1.0X collect limit 2 millions 9728 / 10440 0.1 9277.3 0.4X With this patch: model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) collect limit: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- collect limit 1 million 833 / 1284 1.3 794.4 1.0X collect limit 2 millions 3348 / 4005 0.3 3193.3 0.2X Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #11759 from viirya/execute-take.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala 50 additions, 28 deletions...main/scala/org/apache/spark/sql/execution/SparkPlan.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala 21 additions, 0 deletions...ache/spark/sql/execution/BenchmarkWholeStageCodegen.scala
Loading
Please register or sign in to comment