Skip to content
Snippets Groups Projects
Commit da60b34d authored by Yong Tang's avatar Yong Tang Committed by Nick Pentreath
Browse files

[SPARK-3724][ML] RandomForest: More options for feature subset size.

## What changes were proposed in this pull request?

This PR tries to support more options for feature subset size in RandomForest implementation. Previously, RandomForest only support "auto", "all", "sort", "log2", "onethird". This PR tries to support any given value to allow model search.

In this PR, `featureSubsetStrategy` could be passed with:
a) a real number in the range of `(0.0-1.0]` that represents the fraction of the number of features in each subset,
b)  an integer number (`>0`) that represents the number of features in each subset.

## How was this patch tested?

Two tests `JavaRandomForestClassifierSuite` and `JavaRandomForestRegressorSuite` have been updated to check the additional options for params in this PR.
An additional test has been added to `org.apache.spark.mllib.tree.RandomForestSuite` to cover the cases in this PR.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #11989 from yongtang/SPARK-3724.
parent 124cbfb6
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment