From 069bb94206530a30d095c4920ecfc3b4a8635a72 Mon Sep 17 00:00:00 2001 From: Andrew Ash <andrew@andrewash.com> Date: Tue, 21 Jan 2014 14:49:35 -0800 Subject: [PATCH] Clarify spark.default.parallelism It's the task count across the cluster, not per worker, per machine, per core, or anything else. --- docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index be548e372d..3bb655075f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -98,7 +98,7 @@ Apart from these, the following properties are also available, and may be useful <td>spark.default.parallelism</td> <td>8</td> <td> - Default number of tasks to use for distributed shuffle operations (<code>groupByKey</code>, + Default number of tasks to use across the cluster for distributed shuffle operations (<code>groupByKey</code>, <code>reduceByKey</code>, etc) when not set by user. </td> </tr> -- GitLab