Skip to content
Snippets Groups Projects
  • Yu ISHIKAWA's avatar
    34a889db
    [SPARK-7879] [MLLIB] KMeans API for spark.ml Pipelines · 34a889db
    Yu ISHIKAWA authored
    I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.
    
    [SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #6756 from yu-iskw/SPARK-7879 and squashes the following commits:
    
    be752de [Yu ISHIKAWA] Add assertions
    a14939b [Yu ISHIKAWA] Fix the dashed line's length in pyspark.ml.rst
    4c61693 [Yu ISHIKAWA] Remove the test about whether "features" and "prediction" columns exist or not in Python
    fb2417c [Yu ISHIKAWA] Use getInt, instead of get
    f397be4 [Yu ISHIKAWA] Switch the comparisons.
    ca78b7d [Yu ISHIKAWA] Add the Scala docs about the constraints of each parameter.
    effc650 [Yu ISHIKAWA] Using expertSetParam and expertGetParam
    c8dc6e6 [Yu ISHIKAWA] Remove an unnecessary test
    19a9d63 [Yu ISHIKAWA] Include spark.ml.clustering to python tests
    1abb19c [Yu ISHIKAWA] Add the statements about spark.ml.clustering into pyspark.ml.rst
    f8338bc [Yu ISHIKAWA] Add the placeholders in Python
    4a03003 [Yu ISHIKAWA] Test for contains in Python
    6566c8b [Yu ISHIKAWA] Use `get`, instead of `apply`
    288e8d5 [Yu ISHIKAWA] Using `contains` to check the column names
    5a7d574 [Yu ISHIKAWA] Renamce `validateInitializationMode` to `validateInitMode` and remove throwing exception
    97cfae3 [Yu ISHIKAWA] Fix the type of return value of `KMeans.copy`
    e933723 [Yu ISHIKAWA] Remove the default value of seed from the Model class
    978ee2c [Yu ISHIKAWA] Modify the docs of KMeans, according to mllib's KMeans
    2ec80bc [Yu ISHIKAWA] Fit on 1 line
    e186be1 [Yu ISHIKAWA] Make a few variables, setters and getters be expert ones
    b2c205c [Yu ISHIKAWA] Rename the method `getInitializationSteps` to `getInitSteps` and `setInitializationSteps` to `setInitSteps` in Scala and Python
    f43f5b4 [Yu ISHIKAWA] Rename the method `getInitializationMode` to `getInitMode` and `setInitializationMode` to `setInitMode` in Scala and Python
    3cb5ba4 [Yu ISHIKAWA] Modify the description about epsilon and the validation
    4fa409b [Yu ISHIKAWA] Add a comment about the default value of epsilon
    2f392e1 [Yu ISHIKAWA] Make some variables `final` and Use `IntParam` and `DoubleParam`
    19326f8 [Yu ISHIKAWA] Use `udf`, instead of callUDF
    4d2ad1e [Yu ISHIKAWA] Modify the indentations
    0ae422f [Yu ISHIKAWA] Add a test for `setParams`
    4ff7913 [Yu ISHIKAWA] Add "ml.clustering" to `javacOptions` in SparkBuild.scala
    11ffdf1 [Yu ISHIKAWA] Use `===` and the variable
    220a176 [Yu ISHIKAWA] Set a random seed in the unit testing
    92c3efc [Yu ISHIKAWA] Make the points for a test be fewer
    c758692 [Yu ISHIKAWA] Modify the parameters of KMeans in Python
    6aca147 [Yu ISHIKAWA] Add some unit testings to validate the setter methods
    687cacc [Yu ISHIKAWA] Alias mllib.KMeans as MLlibKMeans in KMeansSuite.scala
    a4dfbef [Yu ISHIKAWA] Modify the last brace and indentations
    5bedc51 [Yu ISHIKAWA] Remve an extra new line
    444c289 [Yu ISHIKAWA] Add the validation for `runs`
    e41989c [Yu ISHIKAWA] Modify how to validate `initStep`
    7ea133a [Yu ISHIKAWA] Change how to validate `initMode`
    7991e15 [Yu ISHIKAWA] Add a validation for `k`
    c2df35d [Yu ISHIKAWA] Make `predict` private
    93aa2ff [Yu ISHIKAWA] Use `withColumn` in `transform`
    d3a79f7 [Yu ISHIKAWA] Remove the inhefited docs
    e9532e1 [Yu ISHIKAWA] make `parentModel` of KMeansModel private
    8559772 [Yu ISHIKAWA] Remove the `paramMap` parameter of KMeans
    6684850 [Yu ISHIKAWA] Rename `initializationSteps` to `initSteps`
    99b1b96 [Yu ISHIKAWA] Rename `initializationMode` to `initMode`
    79ea82b [Yu ISHIKAWA] Modify the parameters of KMeans docs
    6569bcd [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    20a795a [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    11c2a12 [Yu ISHIKAWA] Limit the imports
    badb481 [Yu ISHIKAWA] Alias spark.mllib.{KMeans, KMeansModel}
    f80319a [Yu ISHIKAWA] Rebase mater branch and add copy methods
    85d92b1 [Yu ISHIKAWA] Add `KMeans.setPredictionCol`
    aa9469d [Yu ISHIKAWA] Fix a python test suite error caused by python 3.x
    c2d6bcb [Yu ISHIKAWA] ADD Java test suites of the KMeans API for spark.ml Pipeline
    598ed2e [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Python
    63ad785 [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Scala
    34a889db
    History
    [SPARK-7879] [MLLIB] KMeans API for spark.ml Pipelines
    Yu ISHIKAWA authored
    I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.
    
    [SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #6756 from yu-iskw/SPARK-7879 and squashes the following commits:
    
    be752de [Yu ISHIKAWA] Add assertions
    a14939b [Yu ISHIKAWA] Fix the dashed line's length in pyspark.ml.rst
    4c61693 [Yu ISHIKAWA] Remove the test about whether "features" and "prediction" columns exist or not in Python
    fb2417c [Yu ISHIKAWA] Use getInt, instead of get
    f397be4 [Yu ISHIKAWA] Switch the comparisons.
    ca78b7d [Yu ISHIKAWA] Add the Scala docs about the constraints of each parameter.
    effc650 [Yu ISHIKAWA] Using expertSetParam and expertGetParam
    c8dc6e6 [Yu ISHIKAWA] Remove an unnecessary test
    19a9d63 [Yu ISHIKAWA] Include spark.ml.clustering to python tests
    1abb19c [Yu ISHIKAWA] Add the statements about spark.ml.clustering into pyspark.ml.rst
    f8338bc [Yu ISHIKAWA] Add the placeholders in Python
    4a03003 [Yu ISHIKAWA] Test for contains in Python
    6566c8b [Yu ISHIKAWA] Use `get`, instead of `apply`
    288e8d5 [Yu ISHIKAWA] Using `contains` to check the column names
    5a7d574 [Yu ISHIKAWA] Renamce `validateInitializationMode` to `validateInitMode` and remove throwing exception
    97cfae3 [Yu ISHIKAWA] Fix the type of return value of `KMeans.copy`
    e933723 [Yu ISHIKAWA] Remove the default value of seed from the Model class
    978ee2c [Yu ISHIKAWA] Modify the docs of KMeans, according to mllib's KMeans
    2ec80bc [Yu ISHIKAWA] Fit on 1 line
    e186be1 [Yu ISHIKAWA] Make a few variables, setters and getters be expert ones
    b2c205c [Yu ISHIKAWA] Rename the method `getInitializationSteps` to `getInitSteps` and `setInitializationSteps` to `setInitSteps` in Scala and Python
    f43f5b4 [Yu ISHIKAWA] Rename the method `getInitializationMode` to `getInitMode` and `setInitializationMode` to `setInitMode` in Scala and Python
    3cb5ba4 [Yu ISHIKAWA] Modify the description about epsilon and the validation
    4fa409b [Yu ISHIKAWA] Add a comment about the default value of epsilon
    2f392e1 [Yu ISHIKAWA] Make some variables `final` and Use `IntParam` and `DoubleParam`
    19326f8 [Yu ISHIKAWA] Use `udf`, instead of callUDF
    4d2ad1e [Yu ISHIKAWA] Modify the indentations
    0ae422f [Yu ISHIKAWA] Add a test for `setParams`
    4ff7913 [Yu ISHIKAWA] Add "ml.clustering" to `javacOptions` in SparkBuild.scala
    11ffdf1 [Yu ISHIKAWA] Use `===` and the variable
    220a176 [Yu ISHIKAWA] Set a random seed in the unit testing
    92c3efc [Yu ISHIKAWA] Make the points for a test be fewer
    c758692 [Yu ISHIKAWA] Modify the parameters of KMeans in Python
    6aca147 [Yu ISHIKAWA] Add some unit testings to validate the setter methods
    687cacc [Yu ISHIKAWA] Alias mllib.KMeans as MLlibKMeans in KMeansSuite.scala
    a4dfbef [Yu ISHIKAWA] Modify the last brace and indentations
    5bedc51 [Yu ISHIKAWA] Remve an extra new line
    444c289 [Yu ISHIKAWA] Add the validation for `runs`
    e41989c [Yu ISHIKAWA] Modify how to validate `initStep`
    7ea133a [Yu ISHIKAWA] Change how to validate `initMode`
    7991e15 [Yu ISHIKAWA] Add a validation for `k`
    c2df35d [Yu ISHIKAWA] Make `predict` private
    93aa2ff [Yu ISHIKAWA] Use `withColumn` in `transform`
    d3a79f7 [Yu ISHIKAWA] Remove the inhefited docs
    e9532e1 [Yu ISHIKAWA] make `parentModel` of KMeansModel private
    8559772 [Yu ISHIKAWA] Remove the `paramMap` parameter of KMeans
    6684850 [Yu ISHIKAWA] Rename `initializationSteps` to `initSteps`
    99b1b96 [Yu ISHIKAWA] Rename `initializationMode` to `initMode`
    79ea82b [Yu ISHIKAWA] Modify the parameters of KMeans docs
    6569bcd [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    20a795a [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    11c2a12 [Yu ISHIKAWA] Limit the imports
    badb481 [Yu ISHIKAWA] Alias spark.mllib.{KMeans, KMeansModel}
    f80319a [Yu ISHIKAWA] Rebase mater branch and add copy methods
    85d92b1 [Yu ISHIKAWA] Add `KMeans.setPredictionCol`
    aa9469d [Yu ISHIKAWA] Fix a python test suite error caused by python 3.x
    c2d6bcb [Yu ISHIKAWA] ADD Java test suites of the KMeans API for spark.ml Pipeline
    598ed2e [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Python
    63ad785 [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Scala