-
- Downloads
[SPARK-22001][ML][SQL] ImputerModel can do withColumn for all input columns at one pass
## What changes were proposed in this pull request? SPARK-21690 makes one-pass `Imputer` by parallelizing the computation of all input columns. When we transform dataset with `ImputerModel`, we do `withColumn` on all input columns sequentially. We can also do this on all input columns at once by adding a `withColumns` API to `Dataset`. The new `withColumns` API is for internal use only now. ## How was this patch tested? Existing tests for `ImputerModel`'s change. Added tests for `withColumns` API. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #19229 from viirya/SPARK-22001.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala 4 additions, 6 deletions.../src/main/scala/org/apache/spark/ml/feature/Imputer.scala
- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 30 additions, 12 deletionssql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 52 additions, 0 deletions.../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
Loading
Please register or sign in to comment