Skip to content
Snippets Groups Projects
Commit 2447b1c4 authored by Matei Zaharia's avatar Matei Zaharia
Browse files

Merge pull request #910 from mateiz/ml-doc-tweaks

Small tweaks to MLlib docs
parents 7d3204b0 7a5c4b64
No related branches found
No related tags found
No related merge requests found
...@@ -4,7 +4,7 @@ title: Machine Learning Library (MLlib) ...@@ -4,7 +4,7 @@ title: Machine Learning Library (MLlib)
--- ---
MLlib is a Spark implementation of some common machine learning (ML) MLlib is a Spark implementation of some common machine learning (ML)
functionality, as well associated unit tests and data generators. MLlib functionality, as well associated tests and data generators. MLlib
currently supports four common types of machine learning problem settings, currently supports four common types of machine learning problem settings,
namely, binary classification, regression, clustering and collaborative namely, binary classification, regression, clustering and collaborative
filtering, as well as an underlying gradient descent optimization primitive. filtering, as well as an underlying gradient descent optimization primitive.
...@@ -44,22 +44,20 @@ import org.apache.spark.mllib.regression.LabeledPoint ...@@ -44,22 +44,20 @@ import org.apache.spark.mllib.regression.LabeledPoint
// Load and parse the data file // Load and parse the data file
val data = sc.textFile("mllib/data/sample_svm_data.txt") val data = sc.textFile("mllib/data/sample_svm_data.txt")
val parsedData = data.map(line => { val parsedData = data.map { line =>
val parts = line.split(' ') val parts = line.split(' ')
LabeledPoint(parts(0).toDouble, parts.tail.map(x => x.toDouble).toArray) LabeledPoint(parts(0).toDouble, parts.tail.map(x => x.toDouble).toArray)
}) }
// Run training algorithm // Run training algorithm
val numIterations = 20 val numIterations = 20
val model = SVMWithSGD.train( val model = SVMWithSGD.train(parsedData, numIterations)
parsedData,
numIterations)
// Evaluate model on training examples and compute training error // Evaluate model on training examples and compute training error
val labelAndPreds = parsedData.map(r => { val labelAndPreds = parsedData.map { point =>
val prediction = model.predict(r.features) val prediction = model.predict(point.features)
(r.label, prediction) (point.label, prediction)
}) }
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count
println("trainError = " + trainErr) println("trainError = " + trainErr)
{% endhighlight %} {% endhighlight %}
......
...@@ -29,7 +29,7 @@ import org.apache.spark.mllib.util.DataValidators ...@@ -29,7 +29,7 @@ import org.apache.spark.mllib.util.DataValidators
import org.jblas.DoubleMatrix import org.jblas.DoubleMatrix
/** /**
* Model built using SVM. * Model for Support Vector Machines (SVMs).
* *
* @param weights Weights computed for every feature. * @param weights Weights computed for every feature.
* @param intercept Intercept computed for this model. * @param intercept Intercept computed for this model.
...@@ -48,8 +48,8 @@ class SVMModel( ...@@ -48,8 +48,8 @@ class SVMModel(
} }
/** /**
* Train an SVM using Stochastic Gradient Descent. * Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.
* NOTE: Labels used in SVM should be {0, 1} * NOTE: Labels used in SVM should be {0, 1}.
*/ */
class SVMWithSGD private ( class SVMWithSGD private (
var stepSize: Double, var stepSize: Double,
...@@ -79,7 +79,7 @@ class SVMWithSGD private ( ...@@ -79,7 +79,7 @@ class SVMWithSGD private (
} }
/** /**
* Top-level methods for calling SVM. NOTE: Labels used in SVM should be {0, 1} * Top-level methods for calling SVM. NOTE: Labels used in SVM should be {0, 1}.
*/ */
object SVMWithSGD { object SVMWithSGD {
...@@ -88,14 +88,15 @@ object SVMWithSGD { ...@@ -88,14 +88,15 @@ object SVMWithSGD {
* of iterations of gradient descent using the specified step size. Each iteration uses * of iterations of gradient descent using the specified step size. Each iteration uses
* `miniBatchFraction` fraction of the data to calculate the gradient. The weights used in * `miniBatchFraction` fraction of the data to calculate the gradient. The weights used in
* gradient descent are initialized using the initial weights provided. * gradient descent are initialized using the initial weights provided.
* NOTE: Labels used in SVM should be {0, 1} *
* NOTE: Labels used in SVM should be {0, 1}.
* *
* @param input RDD of (label, array of features) pairs. * @param input RDD of (label, array of features) pairs.
* @param numIterations Number of iterations of gradient descent to run. * @param numIterations Number of iterations of gradient descent to run.
* @param stepSize Step size to be used for each iteration of gradient descent. * @param stepSize Step size to be used for each iteration of gradient descent.
* @param regParam Regularization parameter. * @param regParam Regularization parameter.
* @param miniBatchFraction Fraction of data to be used per iteration. * @param miniBatchFraction Fraction of data to be used per iteration.
* @param initialWeights Initial set of weights to be used. Array should be equal in size to * @param initialWeights Initial set of weights to be used. Array should be equal in size to
* the number of features in the data. * the number of features in the data.
*/ */
def train( def train(
......
...@@ -43,7 +43,7 @@ class LinearRegressionModel( ...@@ -43,7 +43,7 @@ class LinearRegressionModel(
} }
/** /**
* Train a regression model with no regularization using Stochastic Gradient Descent. * Train a linear regression model with no regularization using Stochastic Gradient Descent.
*/ */
class LinearRegressionWithSGD private ( class LinearRegressionWithSGD private (
var stepSize: Double, var stepSize: Double,
...@@ -83,7 +83,7 @@ object LinearRegressionWithSGD { ...@@ -83,7 +83,7 @@ object LinearRegressionWithSGD {
* @param numIterations Number of iterations of gradient descent to run. * @param numIterations Number of iterations of gradient descent to run.
* @param stepSize Step size to be used for each iteration of gradient descent. * @param stepSize Step size to be used for each iteration of gradient descent.
* @param miniBatchFraction Fraction of data to be used per iteration. * @param miniBatchFraction Fraction of data to be used per iteration.
* @param initialWeights Initial set of weights to be used. Array should be equal in size to * @param initialWeights Initial set of weights to be used. Array should be equal in size to
* the number of features in the data. * the number of features in the data.
*/ */
def train( def train(
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment