diff --git a/docs/_config.yml b/docs/_config.yml index 98784866ce7d28ccb8bdd0ac7550b4c87935f72e..9e5a95fe53af69150c13a9879e7180589e0059e2 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -5,7 +5,8 @@ markdown: kramdown # of Spark, Scala, and Mesos. SPARK_VERSION: 1.0.0-incubating-SNAPSHOT SPARK_VERSION_SHORT: 1.0.0 -SCALA_VERSION: "2.10" +SCALA_BINARY_VERSION: "2.10" +SCALA_VERSION: "2.10.3" MESOS_VERSION: 0.13.0 SPARK_ISSUE_TRACKER_URL: https://spark-project.atlassian.net SPARK_GITHUB_URL: https://github.com/apache/incubator-spark diff --git a/docs/bagel-programming-guide.md b/docs/bagel-programming-guide.md index cffa55ee952b0e4ed414499d166a1890fd31a22f..b070d8e73a38b6447bb52224504b9b244adbe655 100644 --- a/docs/bagel-programming-guide.md +++ b/docs/bagel-programming-guide.md @@ -16,7 +16,7 @@ This guide shows the programming model and features of Bagel by walking through To use Bagel in your program, add the following SBT or Maven dependency: groupId = org.apache.spark - artifactId = spark-bagel_{{site.SCALA_VERSION}} + artifactId = spark-bagel_{{site.SCALA_BINARY_VERSION}} version = {{site.SPARK_VERSION}} # Programming Model diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md index 6a9a8d681742fb6245029cbea7af189e0092a1bb..ded12926885b94fcbd44ff0e491dc8f42c542b32 100644 --- a/docs/building-with-maven.md +++ b/docs/building-with-maven.md @@ -17,10 +17,10 @@ You'll need to configure Maven to use more memory than usual by setting `MAVEN_O If you don't run this, you may see errors like the following: - [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes... + [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_BINARY_VERSION}}/classes... [ERROR] PermGen space -> [Help 1] - [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes... + [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_BINARY_VERSION}}/classes... [ERROR] Java heap space -> [Help 1] You can fix this by setting the `MAVEN_OPTS` variable as discussed before. diff --git a/docs/index.md b/docs/index.md index 7fea73024a8a05dda6d24668c9a48fb6e3c3f5b3..aa9c8666e7d75109f92f90d5cf736efe6432c2af 100644 --- a/docs/index.md +++ b/docs/index.md @@ -19,7 +19,7 @@ Spark uses [Simple Build Tool](http://www.scala-sbt.org), which is bundled with sbt/sbt assembly -For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala {{site.SCALA_VERSION}}. If you write applications in Scala, you will need to use this same version of Scala in your own program -- newer major versions may not work. You can get the right version of Scala from [scala-lang.org](http://www.scala-lang.org/download/). +For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala {{site.SCALA_BINARY_VERSION}}. If you write applications in Scala, you will need to use a compatible Scala version (e.g. {{site.SCALA_BINARY_VERSION}}.X) -- newer major versions may not work. You can get the right version of Scala from [scala-lang.org](http://www.scala-lang.org/download/). # Running the Examples and Shell diff --git a/docs/quick-start.md b/docs/quick-start.md index 153081bdaa2863d732956e28dc6e9ad588769f27..13df6beea16e8a2d9fab9c05a90ed238123bf0b2 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -115,7 +115,7 @@ object SimpleApp { def main(args: Array[String]) { val logFile = "$YOUR_SPARK_HOME/README.md" // Should be some file on your system val sc = new SparkContext("local", "Simple App", "YOUR_SPARK_HOME", - List("target/scala-{{site.SCALA_VERSION}}/simple-project_{{site.SCALA_VERSION}}-1.0.jar")) + List("target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar")) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() @@ -214,7 +214,7 @@ To build the program, we also write a Maven `pom.xml` file that lists Spark as a <dependencies> <dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> - <artifactId>spark-core_{{site.SCALA_VERSION}}</artifactId> + <artifactId>spark-core_{{site.SCALA_BINARY_VERSION}}</artifactId> <version>{{site.SPARK_VERSION}}</version> </dependency> </dependencies> diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 5dadd54492dcaf38dbec4a52a46361e2ca336fdf..cd4509ede735a6dd6f40b2dc7fa95a3b23e0441e 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -15,7 +15,7 @@ This can be built by setting the Hadoop version and `SPARK_YARN` environment var SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly The assembled JAR will be something like this: -`./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly_{{site.SPARK_VERSION}}-hadoop2.0.5.jar`. +`./assembly/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-assembly_{{site.SPARK_VERSION}}-hadoop2.0.5.jar`. The build process now also supports new YARN versions (2.2.x). See below. @@ -25,7 +25,7 @@ The build process now also supports new YARN versions (2.2.x). See below. - The assembled jar can be installed into HDFS or used locally. - Your application code must be packaged into a separate JAR file. -If you want to test out the YARN deployment mode, you can use the current Spark examples. A `spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}` file can be generated by running `sbt/sbt assembly`. NOTE: since the documentation you're reading is for Spark version {{site.SPARK_VERSION}}, we are assuming here that you have downloaded Spark {{site.SPARK_VERSION}} or checked it out of source control. If you are using a different version of Spark, the version numbers in the jar generated by the sbt package command will obviously be different. +If you want to test out the YARN deployment mode, you can use the current Spark examples. A `spark-examples_{{site.SCALA_BINARY_VERSION}}-{{site.SPARK_VERSION}}` file can be generated by running `sbt/sbt assembly`. NOTE: since the documentation you're reading is for Spark version {{site.SPARK_VERSION}}, we are assuming here that you have downloaded Spark {{site.SPARK_VERSION}} or checked it out of source control. If you are using a different version of Spark, the version numbers in the jar generated by the sbt package command will obviously be different. # Configuration @@ -78,9 +78,9 @@ For example: $ cp conf/log4j.properties.template conf/log4j.properties # Submit Spark's ApplicationMaster to YARN's ResourceManager, and instruct Spark to run the SparkPi example - $ SPARK_JAR=./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \ + $ SPARK_JAR=./assembly/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \ ./bin/spark-class org.apache.spark.deploy.yarn.Client \ - --jar examples/target/scala-{{site.SCALA_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \ + --jar examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \ --class org.apache.spark.examples.SparkPi \ --args yarn-standalone \ --num-workers 3 \ @@ -117,13 +117,13 @@ In order to tune worker core/number/memory etc. You need to export environment v For example: - SPARK_JAR=./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \ - SPARK_YARN_APP_JAR=examples/target/scala-{{site.SCALA_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \ + SPARK_JAR=./assembly/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \ + SPARK_YARN_APP_JAR=examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \ ./bin/run-example org.apache.spark.examples.SparkPi yarn-client - SPARK_JAR=./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \ - SPARK_YARN_APP_JAR=examples/target/scala-{{site.SCALA_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \ + SPARK_JAR=./assembly/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \ + SPARK_YARN_APP_JAR=examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \ MASTER=yarn-client ./bin/spark-shell diff --git a/docs/scala-programming-guide.md b/docs/scala-programming-guide.md index 7c0f67bc99e8363d6a44e797354fba7f6227578c..cd847e07f94abe7a2b068fb325806c2051a40112 100644 --- a/docs/scala-programming-guide.md +++ b/docs/scala-programming-guide.md @@ -17,12 +17,12 @@ This guide shows each of these features and walks through some samples. It assum # Linking with Spark -Spark {{site.SPARK_VERSION}} uses Scala {{site.SCALA_VERSION}}. If you write applications in Scala, you'll need to use this same version of Scala in your program -- newer major versions may not work. +Spark {{site.SPARK_VERSION}} uses Scala {{site.SCALA_BINARY_VERSION}}. If you write applications in Scala, you will need to use a compatible Scala version (e.g. {{site.SCALA_BINARY_VERSION}}.X) -- newer major versions may not work. To write a Spark application, you need to add a dependency on Spark. If you use SBT or Maven, Spark is available through Maven Central at: groupId = org.apache.spark - artifactId = spark-core_{{site.SCALA_VERSION}} + artifactId = spark-core_{{site.SCALA_BINARY_VERSION}} version = {{site.SPARK_VERSION}} In addition, if you wish to access an HDFS cluster, you need to add a dependency on `hadoop-client` for your version of HDFS: @@ -31,7 +31,7 @@ In addition, if you wish to access an HDFS cluster, you need to add a dependency artifactId = hadoop-client version = <your-hdfs-version> -For other build systems, you can run `sbt/sbt assembly` to pack Spark and its dependencies into one JAR (`assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop*.jar`), then add this to your CLASSPATH. Set the HDFS version as described [here](index.html#a-note-about-hadoop-versions). +For other build systems, you can run `sbt/sbt assembly` to pack Spark and its dependencies into one JAR (`assembly/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop*.jar`), then add this to your CLASSPATH. Set the HDFS version as described [here](index.html#a-note-about-hadoop-versions). Finally, you need to import some Spark classes and implicit conversions into your program. Add the following lines: diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 924f0f4306bc259433533dafbed28be7147d57fa..57e88581616a242f8e115bac9b4f9ccd5c61a3e1 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -275,23 +275,23 @@ To write your own Spark Streaming program, you will have to add the following de SBT or Maven project: groupId = org.apache.spark - artifactId = spark-streaming_{{site.SCALA_VERSION}} + artifactId = spark-streaming_{{site.SCALA_BINARY_VERSION}} version = {{site.SPARK_VERSION}} For ingesting data from sources like Kafka and Flume that are not present in the Spark Streaming core API, you will have to add the corresponding -artifact `spark-streaming-xyz_{{site.SCALA_VERSION}}` to the dependencies. For example, +artifact `spark-streaming-xyz_{{site.SCALA_BINARY_VERSION}}` to the dependencies. For example, some of the common ones are as follows. <table class="table"> <tr><th>Source</th><th>Artifact</th></tr> -<tr><td> Kafka </td><td> spark-streaming-kafka_{{site.SCALA_VERSION}} </td></tr> -<tr><td> Flume </td><td> spark-streaming-flume_{{site.SCALA_VERSION}} </td></tr> -<tr><td> Twitter </td><td> spark-streaming-twitter_{{site.SCALA_VERSION}} </td></tr> -<tr><td> ZeroMQ </td><td> spark-streaming-zeromq_{{site.SCALA_VERSION}} </td></tr> -<tr><td> MQTT </td><td> spark-streaming-mqtt_{{site.SCALA_VERSION}} </td></tr> +<tr><td> Kafka </td><td> spark-streaming-kafka_{{site.SCALA_BINARY_VERSION}} </td></tr> +<tr><td> Flume </td><td> spark-streaming-flume_{{site.SCALA_BINARY_VERSION}} </td></tr> +<tr><td> Twitter </td><td> spark-streaming-twitter_{{site.SCALA_BINARY_VERSION}} </td></tr> +<tr><td> ZeroMQ </td><td> spark-streaming-zeromq_{{site.SCALA_BINARY_VERSION}} </td></tr> +<tr><td> MQTT </td><td> spark-streaming-mqtt_{{site.SCALA_BINARY_VERSION}} </td></tr> <tr><td> </td><td></td></tr> </table> @@ -410,7 +410,7 @@ Scala and [JavaStreamingContext](api/streaming/index.html#org.apache.spark.strea Additional functionality for creating DStreams from sources such as Kafka, Flume, and Twitter can be imported by adding the right dependencies as explained in an [earlier](#linking) section. To take the -case of Kafka, after adding the artifact `spark-streaming-kafka_{{site.SCALA_VERSION}}` to the +case of Kafka, after adding the artifact `spark-streaming-kafka_{{site.SCALA_BINARY_VERSION}}` to the project dependencies, you can create a DStream from Kafka as <div class="codetabs">