From 9ddad0dcb47e3326151a53e270448b5135805ae5 Mon Sep 17 00:00:00 2001
From: Matei Zaharia <matei@eecs.berkeley.edu>
Date: Sat, 31 Aug 2013 17:40:33 -0700
Subject: [PATCH] Fixes suggested by Patrick

---
 conf/spark-env.sh.template    |  2 +-
 docs/hardware-provisioning.md |  1 -
 docs/index.md                 |  9 +++++----
 docs/quick-start.md           | 10 ++--------
 4 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template
index a367d59d64..d92d2e2ae3 100755
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@@ -4,7 +4,7 @@
 # spark-env.sh and edit that to configure Spark for your site.
 #
 # The following variables can be set in this file:
-# - SPARK_LOCAL_IP, to override the IP address binds to
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
 # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
 # - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
 #   we recommend setting app-wide options in the application's driver program.
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index d21e2a3d70..e5f054cb14 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -21,7 +21,6 @@ Hadoop and Spark on a common cluster manager like [Mesos](running-on-mesos.html)
 [Hadoop YARN](running-on-yarn.html).
 
 * If this is not possible, run Spark on different nodes in the same local-area network as HDFS.
-If your cluster spans multiple racks, include some Spark nodes on each rack.
 
 * For low-latency data stores like HBase, it may be preferrable to run computing jobs on different
 nodes than the storage system to avoid interference.
diff --git a/docs/index.md b/docs/index.md
index bcd7dad6ae..0ea0e103e4 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -40,12 +40,13 @@ Python interpreter (`./pyspark`). These are a great way to learn Spark.
 Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
 storage systems. Because the HDFS protocol has changed in different versions of
 Hadoop, you must build Spark against the same version that your cluster uses.
-You can do this by setting the `SPARK_HADOOP_VERSION` variable when compiling:
+By default, Spark links to Hadoop 1.0.4. You can change this by setting the
+`SPARK_HADOOP_VERSION` variable when compiling:
 
     SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
 
-In addition, if you wish to run Spark on [YARN](running-on-yarn.md), you should also
-set `SPARK_YARN`:
+In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
+`SPARK_YARN` to `true`:
 
     SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
 
@@ -94,7 +95,7 @@ set `SPARK_YARN`:
   exercises about Spark, Shark, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/agenda-2012),
   [slides](http://ampcamp.berkeley.edu/agenda-2012) and [exercises](http://ampcamp.berkeley.edu/exercises-2012) are
   available online for free.
-* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark
+* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/) of Spark
 * [Paper Describing Spark](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
 * [Paper Describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)
 
diff --git a/docs/quick-start.md b/docs/quick-start.md
index bac5d690a6..11d4370a1d 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -126,7 +126,7 @@ object SimpleJob {
 
 This job simply counts the number of lines containing 'a' and the number containing 'b' in the Spark README. Note that you'll need to replace $YOUR_SPARK_HOME with the location where Spark is installed. Unlike the earlier examples with the Spark shell, which initializes its own SparkContext, we initialize a SparkContext as part of the job. We pass the SparkContext constructor four arguments, the type of scheduler we want to use (in this case, a local scheduler), a name for the job, the directory where Spark is installed, and a name for the jar file containing the job's sources. The final two arguments are needed in a distributed setting, where Spark is running across several nodes, so we include them for completeness. Spark will automatically ship the jar files you list to slave nodes.
 
-This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds two repositories which host Spark dependencies:
+This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds a repository that Spark depends on:
 
 {% highlight scala %}
 name := "Simple Project"
@@ -137,9 +137,7 @@ scalaVersion := "{{site.SCALA_VERSION}}"
 
 libraryDependencies += "org.spark-project" %% "spark-core" % "{{site.SPARK_VERSION}}"
 
-resolvers ++= Seq(
-  "Akka Repository" at "http://repo.akka.io/releases/",
-  "Spray Repository" at "http://repo.spray.cc/")
+resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
 {% endhighlight %}
 
 If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
@@ -210,10 +208,6 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep
   <packaging>jar</packaging>
   <version>1.0</version>
   <repositories>
-    <repository>
-      <id>Spray.cc repository</id>
-      <url>http://repo.spray.cc</url>
-    </repository>
     <repository>
       <id>Akka repository</id>
       <url>http://repo.akka.io/releases</url>
-- 
GitLab