-
- Downloads
[SPARK-2361][MLLIB] Use broadcast instead of serializing data directly into task closure
We saw task serialization problems with large feature dimension, which could be avoid if we don't serialize data directly into task but use broadcast variables. This PR uses broadcast in both training and prediction and adds tests to make sure the task size is small. Author: Xiangrui Meng <meng@databricks.com> Closes #1427 from mengxr/broadcast-new and squashes the following commits: b9a1228 [Xiangrui Meng] style update b97c184 [Xiangrui Meng] minimal change to LBFGS 9ebadcc [Xiangrui Meng] add task size test to RowMatrix 9427bf0 [Xiangrui Meng] add task size tests to linear methods e0a5cf2 [Xiangrui Meng] add task size test to GD 28a8411 [Xiangrui Meng] add test for NaiveBayes 380778c [Xiangrui Meng] update KMeans test bccab92 [Xiangrui Meng] add task size test to LBFGS 02103ba [Xiangrui Meng] remove print e73d68e [Xiangrui Meng] update tests for k-means 174cb15 [Xiangrui Meng] use local-cluster for test with a small akka.frameSize 1928a5a [Xiangrui Meng] add test for KMeans task size e00c2da [Xiangrui Meng] use broadcast in GD, KMeans 010d076 [Xiangrui Meng] modify NaiveBayesModel and GLM to use broadcast
Showing
- mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala 7 additions, 1 deletion...la/org/apache/spark/mllib/classification/NaiveBayes.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 12 additions, 7 deletions...main/scala/org/apache/spark/mllib/clustering/KMeans.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala 4 additions, 2 deletions...scala/org/apache/spark/mllib/clustering/KMeansModel.scala
- mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala 4 additions, 2 deletions...org/apache/spark/mllib/optimization/GradientDescent.scala
- mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala 4 additions, 3 deletions...ain/scala/org/apache/spark/mllib/optimization/LBFGS.scala
- mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala 5 additions, 2 deletions...e/spark/mllib/regression/GeneralizedLinearAlgorithm.scala
- mllib/src/test/java/org/apache/spark/mllib/classification/JavaLogisticRegressionSuite.java 0 additions, 2 deletions...ark/mllib/classification/JavaLogisticRegressionSuite.java
- mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala 17 additions, 1 deletion.../spark/mllib/classification/LogisticRegressionSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala 19 additions, 1 deletion...g/apache/spark/mllib/classification/NaiveBayesSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/classification/SVMSuite.scala 20 additions, 5 deletions...cala/org/apache/spark/mllib/classification/SVMSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala 49 additions, 26 deletions...scala/org/apache/spark/mllib/clustering/KMeansSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala 27 additions, 2 deletions...pache/spark/mllib/linalg/distributed/RowMatrixSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala 28 additions, 6 deletions...pache/spark/mllib/optimization/GradientDescentSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala 26 additions, 4 deletions...cala/org/apache/spark/mllib/optimization/LBFGSSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/regression/LassoSuite.scala 20 additions, 1 deletion.../scala/org/apache/spark/mllib/regression/LassoSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/regression/LinearRegressionSuite.scala 20 additions, 1 deletion...apache/spark/mllib/regression/LinearRegressionSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/regression/RidgeRegressionSuite.scala 21 additions, 2 deletions.../apache/spark/mllib/regression/RidgeRegressionSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/util/LocalClusterSparkContext.scala 42 additions, 0 deletions...rg/apache/spark/mllib/util/LocalClusterSparkContext.scala
- mllib/src/test/scala/org/apache/spark/mllib/util/LocalSparkContext.scala 5 additions, 2 deletions...scala/org/apache/spark/mllib/util/LocalSparkContext.scala
Loading
Please register or sign in to comment