Commits · 67cc89ff028324ba4a7a7d9c19a268b9afea0031 · cs525-sp18-g07 / spark

Jun 03, 2016

[SPARK-15168][PYSPARK][ML] Add missing params to MultilayerPerceptronClassifier · 67cc89ff

Holden Karau authored 9 years ago

## What changes were proposed in this pull request?

MultilayerPerceptronClassifier is missing step size, solver, and weights. Add these params. Also clarify the scaladoc a bit while we are updating these params.

Eventually we should follow up and unify the HasSolver params (filed https://issues.apache.org/jira/browse/SPARK-15169 )

## How was this patch tested?

Doc tests

Author: Holden Karau <holden@us.ibm.com>

Closes #12943 from holdenk/SPARK-15168-add-missing-params-to-MultilayerPerceptronClassifier.

67cc89ff

Jun 02, 2016

[SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspark TreeEnsemble missing methods · 72353311

Holden Karau authored 9 years ago

## What changes were proposed in this pull request?

Add `toDebugString` and `totalNumNodes` to `TreeEnsembleModels` and add `toDebugString` to `DecisionTreeModel`

## How was this patch tested?

Extended doc tests.

Author: Holden Karau <holden@us.ibm.com>

Closes #12919 from holdenk/SPARK-15139-pyspark-treeEnsemble-missing-methods.

72353311

May 23, 2016

[SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with... · a15ca553

WeichenXu authored 9 years ago

[SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code

## What changes were proposed in this pull request?

Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code.

## How was this patch tested?

Existing test.

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #13242 from WeichenXu123/python_doctest_update_sparksession.

a15ca553

May 17, 2016

[SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms · e2efe052

DB Tsai authored 9 years ago

## What changes were proposed in this pull request?

Once SPARK-14487 and SPARK-14549 are merged, we will migrate to use the new vector and matrix type in the new ml pipeline based apis.

## How was this patch tested?

Unit tests

Author: DB Tsai <dbt@netflix.com>
Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #12627 from dbtsai/SPARK-14615-NewML.

e2efe052

May 13, 2016

[SPARK-15188] Add missing thresholds param to NaiveBayes in PySpark · d1aadea0

Holden Karau authored 9 years ago

## What changes were proposed in this pull request?

Add missing thresholds param to NiaveBayes

## How was this patch tested?
doctests

Author: Holden Karau <holden@us.ibm.com>

Closes #12963 from holdenk/SPARK-15188-add-missing-naive-bayes-param.

d1aadea0

May 09, 2016

[SPARK-15136][PYSPARK][DOC] Fix links to sphinx style and add a default param doc note · 12fe2ecd

Holden Karau authored 9 years ago

## What changes were proposed in this pull request?

PyDoc links in ml are in non-standard format. Switch to standard sphinx link format for better formatted documentation. Also add a note about default value in one place. Copy some extended docs from scala for GBT

## How was this patch tested?

Built docs locally.

Author: Holden Karau <holden@us.ibm.com>

Closes #12918 from holdenk/SPARK-15137-linkify-pyspark-ml-classification.

12fe2ecd

May 03, 2016

[SPARK-14971][ML][PYSPARK] PySpark ML Params setter code clean up · d26f7cb0

Yanbo Liang authored 9 years ago

## What changes were proposed in this pull request?
PySpark ML Params setter code clean up.
For examples,
```setInputCol``` can be simplified from
```
self._set(inputCol=value)
return self
```
to:
```
return self._set(inputCol=value)
```
This is a pretty big sweeps, and we cleaned wherever possible.
## How was this patch tested?
Exist unit tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12749 from yanboliang/spark-14971.

d26f7cb0

May 01, 2016

[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update · a6428292

Xusen Yin authored 9 years ago

## What changes were proposed in this pull request?

This PR is an update for [https://github.com/apache/spark/pull/12738] which:
* Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side
* Various fixes for bugs found
  * This includes changing classes taking weightCol to treat unset and empty String Param values the same way.

Defaults changed:
* Scala
 * LogisticRegression: weightCol defaults to not set (instead of empty string)
 * StringIndexer: labels default to not set (instead of empty array)
 * GeneralizedLinearRegression:
   * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver)
   * weightCol defaults to not set (instead of empty string)
 * LinearRegression: weightCol defaults to not set (instead of empty string)
* Python
 * MultilayerPerceptron: layers default to not set (instead of [1,1])
 * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set)

## How was this patch tested?

Generic unit test.  Manually tested that unit test by changing defaults and verifying that broke the test.

Author: Joseph K. Bradley <joseph@databricks.com>
Author: yinxusen <yinxusen@gmail.com>

Closes #12816 from jkbradley/yinxusen-SPARK-14931.

a6428292

Apr 30, 2016

[SPARK-14952][CORE][ML] Remove methods that were deprecated in 1.6.0 · e5fb78ba

Herman van Hovell authored 9 years ago

#### What changes were proposed in this pull request?

This PR removes three methods the were deprecated in 1.6.0:
- `PortableDataStream.close()`
- `LinearRegression.weights`
- `LogisticRegression.weights`

The rationale for doing this is that the impact is small and that Spark 2.0 is a major release.

#### How was this patch tested?
Compilation succeded.

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #12732 from hvanhovell/SPARK-14952.

e5fb78ba

Apr 20, 2016

[SPARK-14555] First cut of Python API for Structured Streaming · 80bf48f4

Burak Yavuz authored 9 years ago

## What changes were proposed in this pull request?

This patch provides a first cut of python APIs for structured streaming. This PR provides the new classes:
 - ContinuousQuery
 - Trigger
 - ProcessingTime
in pyspark under `pyspark.sql.streaming`.

In addition, it contains the new methods added under:
 -  `DataFrameWriter`
     a) `startStream`
     b) `trigger`
     c) `queryName`

 -  `DataFrameReader`
     a) `stream`

 - `DataFrame`
    a) `isStreaming`

This PR doesn't contain all methods exposed for `ContinuousQuery`, for example:
 - `exception`
 - `sourceStatuses`
 - `sinkStatus`

They may be added in a follow up.

This PR also contains some very minor doc fixes in the Scala side.

## How was this patch tested?

Python doc tests

TODO:
 - [ ] verify Python docs look good

Author: Burak Yavuz <brkyvz@gmail.com>
Author: Burak Yavuz <burak@databricks.com>

Closes #12320 from brkyvz/stream-python.

80bf48f4

Apr 18, 2016

[SPARK-14306][ML][PYSPARK] PySpark ml.classification OneVsRest support export/import · b64482f4

Xusen Yin authored 9 years ago

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-14306

Add PySpark OneVsRest save/load supports.

## How was this patch tested?

Test with Python unit test.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #12439 from yinxusen/SPARK-14306-0415.

b64482f4

Apr 15, 2016

[SPARK-7861][ML] PySpark OneVsRest · 90b46e01

Xusen Yin authored 9 years ago

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-7861

Add PySpark OneVsRest. I implement it with Python since it's a meta-pipeline.

## How was this patch tested?

Test with doctest.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #12124 from yinxusen/SPARK-14306-7861.

90b46e01

[SPARK-14104][PYSPARK][ML] All Python param setters should use the `_set` method · 129f2f45

sethah authored 9 years ago

## What changes were proposed in this pull request?

Param setters in python previously accessed the _paramMap directly to update values. The `_set` method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to `_paramMap` besides the one in the `_set` method to ensure type checking happens.

Additional changes:
* [SPARK-13068](https://github.com/apache/spark/pull/11663) missed adding type converters in evaluation.py so those are done here
* An incorrect `toBoolean` type converter was used for StringIndexer `handleInvalid` param in previous PR. This is fixed here.

## How was this patch tested?

Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #11939 from sethah/SPARK-14104.

129f2f45

Apr 14, 2016

[SPARK-14374][ML][PYSPARK] PySpark ml GBTClassifier, Regressor support export/import · b9613239

Yanbo Liang authored 9 years ago

## What changes were proposed in this pull request?
PySpark ml GBTClassifier, Regressor support export/import.

## How was this patch tested?
Doc test.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12383 from yanboliang/spark-14374.

b9613239

Apr 13, 2016

[SPARK-14472][PYSPARK][ML] Cleanup ML JavaWrapper and related class hierarchy · fc3cd2f5

Bryan Cutler authored 9 years ago

Currently, JavaWrapper is only a wrapper class for pipeline classes that have Params and JavaCallable is a separate mixin that provides methods to make Java calls. This change simplifies the class structure and to define the Java wrapper in a plain base class along with methods to make Java calls. Also, renames Java wrapper classes to better reflect their purpose.

Ran existing Python ml tests and generated documentation to test this change.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #12304 from BryanCutler/pyspark-cleanup-JavaWrapper-SPARK-14472.

fc3cd2f5

Apr 08, 2016

[SPARK-14498][ML][PYTHON][SQL] Many cleanups to ML and ML-related docs · d7af736b

Joseph K. Bradley authored 9 years ago

## What changes were proposed in this pull request?

Cleanups to documentation.  No changes to code.
* GBT docs: Move Scala doc for private object GradientBoostedTrees to public docs for GBTClassifier,Regressor
* GLM regParam: needs doc saying it is for L2 only
* TrainValidationSplitModel: add .. versionadded:: 2.0.0
* Rename “_transformer_params_from_java” to “_transfer_params_from_java”
* LogReg Summary classes: “probability” col should not say “calibrated”
* LR summaries: coefficientStandardErrors —> document that intercept stderr comes last.  Same for t,p-values
* approxCountDistinct: Document meaning of “rsd" argument.
* LDA: note which params are for online LDA only

## How was this patch tested?

Doc build

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #12266 from jkbradley/ml-doc-cleanups.

d7af736b

[SPARK-14373][PYSPARK] PySpark RandomForestClassifier, Regressor support export/import · e5d8d6e0

Kai Jiang authored 9 years ago

## What changes were proposed in this pull request?
supporting `RandomForest{Classifier, Regressor}` save/load for Python API.
[JIRA](https://issues.apache.org/jira/browse/SPARK-14373)
## How was this patch tested?
doctest

Author: Kai Jiang <jiangkai@gmail.com>

Closes #12238 from vectorijk/spark-14373.

e5d8d6e0

Apr 06, 2016

[SPARK-13430][PYSPARK][ML] Python API for training summaries of linear and logistic regression · 9c6556c5

Bryan Cutler authored 9 years ago

## What changes were proposed in this pull request?

Adding Python API for training summaries of LogisticRegression and LinearRegression in PySpark ML.

## How was this patch tested?
Added unit tests to exercise the api calls for the summary classes.  Also, manually verified values are expected and match those from Scala directly.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #11621 from BryanCutler/pyspark-ml-summary-SPARK-13430.

9c6556c5

Apr 01, 2016

[SPARK-11262][ML] Unit test for gradient, loss layers, memory management for multilayer perceptron · 26867ebc

Alexander Ulanov authored 9 years ago

1.Implement LossFunction trait and implement squared error and cross entropy
loss with it
2.Implement unit test for gradient and loss
3.Implement InPlace trait and in-place layer evaluation
4.Refactor interface for ActivationFunction
5.Update of Layer and LayerModel interfaces
6.Fix random weights assignment
7.Implement memory allocation by MLP model instead of individual layers

These features decreased the memory usage and increased flexibility of
internal API.

Author: Alexander Ulanov <nashb@yandex.ru>
Author: avulanov <avulanov@gmail.com>

Closes #9229 from avulanov/mlp-refactoring.

26867ebc

Mar 31, 2016

[SPARK-14264][PYSPARK][ML] Add feature importance for GBTs in pyspark · b11887c0

sethah authored 9 years ago

## What changes were proposed in this pull request?

Feature importances are exposed in the python API for GBTs.

Other changes:
* Update the random forest feature importance documentation to not repeat decision tree docstring and instead place a reference to it.

## How was this patch tested?

Python doc tests were updated to validate GBT feature importance.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #12056 from sethah/Pyspark_GBT_feature_importance.

b11887c0

Mar 30, 2016

[SPARK-14152][ML][PYSPARK] MultilayerPerceptronClassifier supports save/load for Python API · f301df37

Yanbo Liang authored 9 years ago

## What changes were proposed in this pull request?
```MultilayerPerceptronClassifier``` supports save/load for Python API.

## How was this patch tested?
doctest.

cc mengxr jkbradley yinxusen

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11952 from yanboliang/spark-14152.

f301df37

Mar 24, 2016

[SPARK-13949][ML][PYTHON] PySpark ml DecisionTreeClassifier, Regressor support export/import · 0874ff3a

GayathriMurali authored 9 years ago

## What changes were proposed in this pull request?

Added MLReadable and MLWritable to Decision Tree Classifier and Regressor. Added doctests.

## How was this patch tested?

Python Unit tests. Tests added to check persistence in DecisionTreeClassifier and DecisionTreeRegressor.

Author: GayathriMurali <gayathri.m.softie@gmail.com>

Closes #11892 from GayathriMurali/SPARK-13949.

0874ff3a

[SPARK-14107][PYSPARK][ML] Add seed as named argument to GBTs in pyspark · 58509771

sethah authored 9 years ago

## What changes were proposed in this pull request?

GBTs in pyspark previously had seed parameters, but they could not be passed as keyword arguments through the class constructor. This patch adds seed as a keyword argument and also sets default value.

## How was this patch tested?

Doc tests were updated to pass a random seed through the GBTClassifier and GBTRegressor constructors.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #11944 from sethah/SPARK-14107.

58509771

Mar 23, 2016

[SPARK-13068][PYSPARK][ML] Type conversion for Pyspark params · 30bdb5cb

sethah authored 9 years ago

## What changes were proposed in this pull request?

This patch adds type conversion functionality for parameters in Pyspark. A `typeConverter` field is added to the constructor of `Param` class. This argument is a function which converts values passed to this param to the appropriate type if possible. This is beneficial so that the params can fail at set time if they are given inappropriate values, but even more so because coherent error messages are now provided when Py4J cannot cast the python type to the appropriate Java type.

This patch also adds a `TypeConverters` class with factory methods for common type conversions. Most of the changes involve adding these factory type converters to existing params. The previous solution to this issue, `expectedType`, is deprecated and can be removed in 2.1.0 as discussed on the Jira.

## How was this patch tested?

Unit tests were added in python/pyspark/ml/tests.py to test parameter type conversion. These tests check that values that should be convertible are converted correctly, and that the appropriate errors are thrown when invalid values are provided.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #11663 from sethah/SPARK-13068-tc.

30bdb5cb

Mar 22, 2016

[SPARK-13951][ML][PYTHON] Nested Pipeline persistence · 7e3423b9

Joseph K. Bradley authored 9 years ago

Adds support for saving and loading nested ML Pipelines from Python.  Pipeline and PipelineModel do not extend JavaWrapper, but they are able to utilize the JavaMLWriter, JavaMLReader implementations.

Also:
* Separates out interfaces from Java wrapper implementations for MLWritable, MLReadable, MLWriter, MLReader.
* Moves methods _stages_java2py, _stages_py2java into Pipeline, PipelineModel as _transfer_stage_from_java, _transfer_stage_to_java

Added new unit test for nested Pipelines.  Abstracted validity check into a helper method for the 2 unit tests.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #11866 from jkbradley/nested-pipeline-io.
Closes #11835

7e3423b9

Mar 16, 2016

[SPARK-13034] PySpark ml.classification support export/import · 27e1f388

GayathriMurali authored 9 years ago

## What changes were proposed in this pull request?

Add export/import for all estimators and transformers(which have Scala implementation) under pyspark/ml/classification.py.

## How was this patch tested?

./python/run-tests
./dev/lint-python
Unit tests added to check persistence in Logistic Regression

Author: GayathriMurali <gayathri.m.softie@gmail.com>

Closes #11707 from GayathriMurali/SPARK-13034.

27e1f388

Mar 11, 2016

[SPARK-13787][ML][PYSPARK] Pyspark feature importances for decision tree and random forest · 234f781a

sethah authored 9 years ago

## What changes were proposed in this pull request?

This patch adds a `featureImportance` property to the Pyspark API for `DecisionTreeRegressionModel`, `DecisionTreeClassificationModel`, `RandomForestRegressionModel` and `RandomForestClassificationModel`.

## How was this patch tested?

Python doc tests for the affected classes were updated to check feature importances.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #11622 from sethah/SPARK-13787.

234f781a

Mar 04, 2016

[SPARK-13676] Fix mismatched default values for regParam in LogisticRegression · c8f25459

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

The default value of regularization parameter for `LogisticRegression` algorithm is different in Scala and Python. We should provide the same value.

**Scala**
```
scala> new org.apache.spark.ml.classification.LogisticRegression().getRegParam
res0: Double = 0.0
```

**Python**
```
>>> from pyspark.ml.classification import LogisticRegression
>>> LogisticRegression().getRegParam()
0.1
```

## How was this patch tested?
manual. Check the following in `pyspark`.
```
>>> from pyspark.ml.classification import LogisticRegression
>>> LogisticRegression().getRegParam()
0.0
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11519 from dongjoon-hyun/SPARK-13676.

c8f25459

Mar 01, 2016

[SPARK-13008][ML][PYTHON] Put one alg per line in pyspark.ml all lists · 9495c40f

Joseph K. Bradley authored 9 years ago

This is to fix a long-time annoyance: Whenever we add a new algorithm to pyspark.ml, we have to add it to the ```__all__``` list at the top. Since we keep it alphabetized, it often creates a lot more changes than needed. It is also easy to add the Estimator and forget the Model. I'm going to switch it to have one algorithm per line.

This also alphabetizes a few out-of-place classes in pyspark.ml.feature. No changes have been made to the moved classes.

CC: thunterdb

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #10927 from jkbradley/ml-python-all-list.

9495c40f

Jan 26, 2016

[SPARK-10509][PYSPARK] Reduce excessive param boiler plate code · eb917291

Holden Karau authored 9 years ago

The current python ml params require cut-and-pasting the param setup and description between the class & ```__init__``` methods. Remove this possible case of errors & simplify use of custom params by adding a ```_copy_new_parent``` method to param so as to avoid cut and pasting (and cut and pasting at different indentation levels urgh).

Author: Holden Karau <holden@us.ibm.com>

Closes #10216 from holdenk/SPARK-10509-excessive-param-boiler-plate-code.

eb917291

Jan 06, 2016

[SPARK-11815][ML][PYSPARK] PySpark DecisionTreeClassifier &... · 3aa34882

Yanbo Liang authored 9 years ago

[SPARK-11815][ML][PYSPARK] PySpark DecisionTreeClassifier & DecisionTreeRegressor should support setSeed

PySpark ```DecisionTreeClassifier``` & ```DecisionTreeRegressor``` should support ```setSeed``` like what we do at Scala side.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9807 from yanboliang/spark-11815.

3aa34882

Dec 03, 2015

[MINOR][ML] Use coefficients replace weights · d576e76b

Yanbo Liang authored 9 years ago

Use ```coefficients``` replace ```weights```, I wish they are the last two.
mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10065 from yanboliang/coefficients.

d576e76b

Nov 18, 2015

[SPARK-11820][ML][PYSPARK] PySpark LiR & LoR should support weightCol · 603a721c

Yanbo Liang authored 9 years ago

[SPARK-7685](https://issues.apache.org/jira/browse/SPARK-7685) and [SPARK-9642](https://issues.apache.org/jira/browse/SPARK-9642) have already supported setting weight column for ```LogisticRegression``` and ```LinearRegression```. It's a very important feature, PySpark should also support. mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9811 from yanboliang/spark-11820.

603a721c

Nov 09, 2015
- [SPARK-10280][MLLIB][PYSPARK][DOCS] Add @since annotation to pyspark.ml.classification · 88a3fdcc
  Yu ISHIKAWA authored 9 years ago
  
  Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8690 from yu-iskw/SPARK-10280.
  88a3fdcc
Nov 02, 2015

[SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models · c020f7d9

vectorijk authored 9 years ago

Deprecated in `LogisticRegression` and `LinearRegression`

Author: vectorijk <jiangkai@gmail.com>

Closes #9311 from vectorijk/spark-10592.

c020f7d9

Oct 27, 2015

[SPARK-10024][PYSPARK] Python API RF and GBT related params clear up · 9dba5fb2

vectorijk authored 9 years ago

implement {RandomForest, GBT, TreeEnsemble, TreeClassifier, TreeRegressor}Params for Python API
in pyspark/ml/{classification, regression}.py

Author: vectorijk <jiangkai@gmail.com>

Closes #9233 from vectorijk/spark-10024.

9dba5fb2

Sep 11, 2015

[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier · b01b2626

Yanbo Liang authored 10 years ago

Add Python API for ```MultilayerPerceptronClassifier```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8067 from yanboliang/SPARK-9773.

b01b2626

[SPARK-10026] [ML] [PySpark] Implement some common Params for regression in PySpark · b656e613

Yanbo Liang authored 10 years ago

LinearRegression and LogisticRegression lack of some Params for Python, and some Params are not shared classes which lead we need to write them for each class. These kinds of Params are list here:
```scala
HasElasticNetParam
HasFitIntercept
HasStandardization
HasThresholds
```
Here we implement them in shared params at Python side and make LinearRegression/LogisticRegression parameters peer with Scala one.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8508 from yanboliang/spark-10026.

b656e613

Aug 12, 2015

[SPARK-9789] [ML] Added logreg threshold param back · 551def5d

Joseph K. Bradley authored 10 years ago

Reinstated LogisticRegression.threshold Param for binary compatibility. Param thresholds overrides threshold, if set.

CC: mengxr dbtsai feynmanliang

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8079 from jkbradley/logreg-reinstate-threshold.

551def5d

[SPARK-9766] [ML] [PySpark] check and add miss docs for PySpark ML · 762bacc1

Yanbo Liang authored 10 years ago

Check and add miss docs for PySpark ML (this issue only check miss docs for o.a.s.ml not o.a.s.mllib).

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8059 from yanboliang/SPARK-9766.

762bacc1