Skip to content
Snippets Groups Projects
  • DB Tsai's avatar
    55960869
    [SPARK-1969][MLlib] Online summarizer APIs for mean, variance, min, and max · 55960869
    DB Tsai authored
    It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi with documentation and unitests.
    
    Changes:
    1) Moved the private implementation from org.apache.spark.mllib.linalg.ColumnStatisticsAggregator to org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
    2) When creating OnlineSummarizer object, the number of columns is not needed in the constructor. It's determined when users add the first sample.
    3) Added the APIs documentation for MultivariateOnlineSummarizer.
    4) Added the unittests for MultivariateOnlineSummarizer.
    
    Author: DB Tsai <dbtsai@dbtsai.com>
    
    Closes #955 from dbtsai/dbtsai-summarizer and squashes the following commits:
    
    b13ac90 [DB Tsai] dbtsai-summarizer
    55960869
    History
    [SPARK-1969][MLlib] Online summarizer APIs for mean, variance, min, and max
    DB Tsai authored
    It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi with documentation and unitests.
    
    Changes:
    1) Moved the private implementation from org.apache.spark.mllib.linalg.ColumnStatisticsAggregator to org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
    2) When creating OnlineSummarizer object, the number of columns is not needed in the constructor. It's determined when users add the first sample.
    3) Added the APIs documentation for MultivariateOnlineSummarizer.
    4) Added the unittests for MultivariateOnlineSummarizer.
    
    Author: DB Tsai <dbtsai@dbtsai.com>
    
    Closes #955 from dbtsai/dbtsai-summarizer and squashes the following commits:
    
    b13ac90 [DB Tsai] dbtsai-summarizer