-
- Downloads
[SPARK-14031][SQL] speedup CSV writer
## What changes were proposed in this pull request? Currently, we create an CSVWriter for every row, it's very expensive and memory hungry, took about 15 seconds to write out 1 mm rows (two columns). This PR will write the rows in batch mode, create a CSVWriter for every 1k rows, which could write out 1 mm rows in about 1 seconds (15X faster). ## How was this patch tested? Manually benchmark it. Author: Davies Liu <davies@databricks.com> Closes #13229 from davies/csv_writer.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala 14 additions, 5 deletions...pache/spark/sql/execution/datasources/csv/CSVParser.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala 15 additions, 8 deletions...che/spark/sql/execution/datasources/csv/CSVRelation.scala
Loading
Please register or sign in to comment