Skip to content
Snippets Groups Projects
Commit 4f4721a2 authored by Joseph K. Bradley's avatar Joseph K. Bradley
Browse files

[SPARK-14862][ML] Updated Classifiers to not require labelCol metadata

## What changes were proposed in this pull request?

Updated Classifier, DecisionTreeClassifier, RandomForestClassifier, GBTClassifier to not require input column metadata.
* They first check for metadata.
* If numClasses is not specified in metadata, they identify the largest label value (up to a limit).

This functionality is implemented in a new Classifier.getNumClasses method.

Also
* Updated Classifier.extractLabeledPoints to (a) check label values and (b) include a second version which takes a numClasses value for validity checking.

## How was this patch tested?

* Unit tests in ClassifierSuite for helper methods
* Unit tests for DecisionTreeClassifier, RandomForestClassifier, GBTClassifier with toy datasets lacking label metadata

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #12663 from jkbradley/trees-no-metadata.
parent dae538a4
No related branches found
No related tags found
Loading
Showing with 245 additions and 31 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment