-
- Downloads
[SPARK-20979][SS] Add RateSource to generate values for tests and benchmark
## What changes were proposed in this pull request? This PR adds RateSource for Structured Streaming so that the user can use it to generate data for tests and benchmark easily. This source generates increment long values with timestamps. Each generated row has two columns: a timestamp column for the generated time and an auto increment long column starting with 0L. It supports the following options: - `rowsPerSecond` (e.g. 100, default: 1): How many rows should be generated per second. - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the generating speed becomes `rowsPerSecond`. Using finer granularities than seconds will be truncated to integer seconds. - `numPartitions` (e.g. 10, default: Spark's default parallelism): The partition number for the generated rows. The source will try its best to reach `rowsPerSecond`, but the query may be resource constrained, and `numPartitions` can be tweaked to help reach the desired speed. Here is a simple example that prints 10 rows per seconds: ``` spark.readStream .format("rate") .option("rowsPerSecond", "10") .load() .writeStream .format("console") .start() ``` The idea came from marmbrus and he did the initial work. ## How was this patch tested? The added tests. Author: Shixiong Zhu <shixiong@databricks.com> Author: Michael Armbrust <michael@databricks.com> Closes #18199 from zsxwing/rate.
Showing
- sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister 1 addition, 0 deletions.../services/org.apache.spark.sql.sources.DataSourceRegister
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala 243 additions, 0 deletions...he/spark/sql/execution/streaming/RateSourceProvider.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/RateSourceSuite.scala 182 additions, 0 deletions...pache/spark/sql/execution/streaming/RateSourceSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala 3 additions, 0 deletions...est/scala/org/apache/spark/sql/streaming/StreamTest.scala
Please register or sign in to comment