@@ -21,6 +21,9 @@ Spark uses [Simple Build Tool](http://www.scala-sbt.org), which is bundled with
...
@@ -21,6 +21,9 @@ Spark uses [Simple Build Tool](http://www.scala-sbt.org), which is bundled with
For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala {{site.SCALA_VERSION}}. If you write applications in Scala, you will need to use this same version of Scala in your own program -- newer major versions may not work. You can get the right version of Scala from [scala-lang.org](http://www.scala-lang.org/download/).
For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala {{site.SCALA_VERSION}}. If you write applications in Scala, you will need to use this same version of Scala in your own program -- newer major versions may not work. You can get the right version of Scala from [scala-lang.org](http://www.scala-lang.org/download/).
Note: if you are building a binary distribution using `./make-distribution.sh`, you will not need to run
`sbt/sbt assembly`.
# Testing the Build
# Testing the Build
Spark comes with several sample programs in the `examples` directory.
Spark comes with several sample programs in the `examples` directory.
...
@@ -46,6 +49,11 @@ Spark supports several options for deployment:
...
@@ -46,6 +49,11 @@ Spark supports several options for deployment:
*[Apache Mesos](running-on-mesos.html)
*[Apache Mesos](running-on-mesos.html)
*[Hadoop YARN](running-on-yarn.html)
*[Hadoop YARN](running-on-yarn.html)
There is a script, `./make-distribution.sh`, which will create a binary distribution of Spark for deployment
to any machine with only the Java runtime as a necessary dependency.
Running the script creates a distribution directory in `dist/`, or the `-tgz` option to create a .tgz file.
Check the script for additional options.
# A Note About Hadoop Versions
# A Note About Hadoop Versions
Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided [launch scripts](#cluster-launch-scripts). It is also possible to run these daemons on a single machine for testing.
In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided [launch scripts](#cluster-launch-scripts). It is also possible to run these daemons on a single machine for testing.
# Deploying Spark Standalone to a Cluster
The easiest way to deploy Spark is by running the `./make-distribution.sh` script to create a binary distribution.
This distribution can be deployed to any machine with the Java runtime installed; there is no need to install Scala.
The recommended procedure is to deploy and start the master on one node first, get the master spark URL,
then modify `conf/spark-env.sh` in the `dist/` directory before deploying to all the other nodes.
It is also possible to deploy the source directory once you have built it with `sbt assembly`. Scala 2.9.3
will need to be deployed on all the machines as well, and SCALA_HOME will need to point to the Scala installation.
# Starting a Cluster Manually
# Starting a Cluster Manually
You can start a standalone master server by executing:
You can start a standalone master server by executing: