diff --git a/docs/img/spark-webui-accumulators.png b/docs/img/spark-webui-accumulators.png new file mode 100644 index 0000000000000000000000000000000000000000..237052d7b5db60539c2259a1ddb5c949dc9cec7b Binary files /dev/null and b/docs/img/spark-webui-accumulators.png differ diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 2f0ed5eca2b2be5e01cb914772117eebfa62070d..f398e38fbb32741173df8b6044f0a7fdf972af57 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -1328,12 +1328,18 @@ value of the broadcast variable (e.g. if the variable is shipped to a new node l Accumulators are variables that are only "added" to through an associative and commutative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers -can add support for new types. If accumulators are created with a name, they will be +can add support for new types. + +If accumulators are created with a name, they will be displayed in Spark's UI. This can be useful for understanding the progress of running stages (NOTE: this is not yet supported in Python). +<p style="text-align: center;"> + <img src="img/spark-webui-accumulators.png" title="Accumulators in the Spark UI" alt="Accumulators in the Spark UI" /> +</p> + An accumulator is created from an initial value `v` by calling `SparkContext.accumulator(v)`. Tasks -running on the cluster can then add to it using the `add` method or the `+=` operator (in Scala and Python). +running on a cluster can then add to it using the `add` method or the `+=` operator (in Scala and Python). However, they cannot read its value. Only the driver program can read the accumulator's value, using its `value` method. @@ -1345,7 +1351,7 @@ The code below shows an accumulator being used to add up the elements of an arra {% highlight scala %} scala> val accum = sc.accumulator(0, "My Accumulator") -accum: spark.Accumulator[Int] = 0 +accum: org.apache.spark.Accumulator[Int] = 0 scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x) ... @@ -1466,11 +1472,11 @@ Accumulators do not change the lazy evaluation model of Spark. If they are being <div class="codetabs"> -<div data-lang="scala" markdown="1"> +<div data-lang="scala" markdown="1"> {% highlight scala %} val accum = sc.accumulator(0) -data.map { x => accum += x; f(x) } -// Here, accum is still 0 because no actions have caused the <code>map</code> to be computed. +data.map { x => accum += x; x } +// Here, accum is still 0 because no actions have caused the map operation to be computed. {% endhighlight %} </div>