-
- Downloads
[SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format for vectorized UDF.
## What changes were proposed in this pull request? Currently we use Arrow File format to communicate with Python worker when invoking vectorized UDF but we can use Arrow Stream format. This pr replaces the Arrow File format with the Arrow Stream format. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <ueshin@databricks.com> Closes #19349 from ueshin/issues/SPARK-22125.
Showing
- core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 4 additions, 321 deletions...rc/main/scala/org/apache/spark/api/python/PythonRDD.scala
- core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala 441 additions, 0 deletions...main/scala/org/apache/spark/api/python/PythonRunner.scala
- python/pyspark/serializers.py 44 additions, 26 deletionspython/pyspark/serializers.py
- python/pyspark/worker.py 2 additions, 2 deletionspython/pyspark/worker.py
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java 5 additions, 0 deletions.../apache/spark/sql/execution/vectorized/ColumnarBatch.java
- sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala 33 additions, 21 deletions...ache/spark/sql/execution/python/ArrowEvalPythonExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala 181 additions, 0 deletions...apache/spark/sql/execution/python/ArrowPythonRunner.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala 2 additions, 2 deletions...ache/spark/sql/execution/python/BatchEvalPythonExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala 113 additions, 0 deletions...g/apache/spark/sql/execution/python/PythonUDFRunner.scala
Loading
Please register or sign in to comment