Skip to content
Snippets Groups Projects
Commit 41e0a21b authored by Thomas Graves's avatar Thomas Graves
Browse files

SPARK-1680: use configs for specifying environment variables on YARN

Note that this also documents spark.executorEnv.*  which to me means its public.  If we don't want that please speak up.

Author: Thomas Graves <tgraves@apache.org>

Closes #1512 from tgravescs/SPARK-1680 and squashes the following commits:

11525df [Thomas Graves] more doc changes
553bad0 [Thomas Graves] fix documentation
152bf7c [Thomas Graves] fix docs
5382326 [Thomas Graves] try fix docs
32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
parent 74f82c71
No related branches found
No related tags found
No related merge requests found
...@@ -206,6 +206,14 @@ Apart from these, the following properties are also available, and may be useful ...@@ -206,6 +206,14 @@ Apart from these, the following properties are also available, and may be useful
used during aggregation goes above this amount, it will spill the data into disks. used during aggregation goes above this amount, it will spill the data into disks.
</td> </td>
</tr> </tr>
<tr>
<td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
<td>(none)</td>
<td>
Add the environment variable specified by <code>EnvironmentVariableName</code> to the Executor
process. The user can specify multiple of these and to set multiple environment variables.
</td>
</tr>
</table> </table>
#### Shuffle Behavior #### Shuffle Behavior
......
...@@ -17,10 +17,6 @@ To build Spark yourself, refer to the [building with Maven guide](building-with- ...@@ -17,10 +17,6 @@ To build Spark yourself, refer to the [building with Maven guide](building-with-
Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page](configuration.html) for more information on those. These are configs that are specific to Spark on YARN. Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page](configuration.html) for more information on those. These are configs that are specific to Spark on YARN.
#### Environment Variables
* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes launched on YARN. This can be a comma separated list of environment variables, e.g. `SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"`.
#### Spark Properties #### Spark Properties
<table class="table"> <table class="table">
...@@ -110,7 +106,23 @@ Most of the configs are the same for Spark on YARN as for other deployment modes ...@@ -110,7 +106,23 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
<td><code>spark.yarn.access.namenodes</code></td> <td><code>spark.yarn.access.namenodes</code></td>
<td>(none)</td> <td>(none)</td>
<td> <td>
A list of secure HDFS namenodes your Spark application is going to access. For example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The Spark application must have acess to the namenodes listed and Kerberos must be properly configured to be able to access them (either in the same realm or in a trusted realm). Spark acquires security tokens for each of the namenodes so that the Spark application can access those remote HDFS clusters. A list of secure HDFS namenodes your Spark application is going to access. For
example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`.
The Spark application must have acess to the namenodes listed and Kerberos must
be properly configured to be able to access them (either in the same realm or in
a trusted realm). Spark acquires security tokens for each of the namenodes so that
the Spark application can access those remote HDFS clusters.
</td>
</tr>
<tr>
<td><code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code></td>
<td>(none)</td>
<td>
Add the environment variable specified by <code>EnvironmentVariableName</code> to the
Application Master process launched on YARN. The user can specify multiple of
these and to set multiple environment variables. In yarn-cluster mode this controls
the environment of the SPARK driver and in yarn-client mode it only controls
the environment of the executor launcher.
</td> </td>
</tr> </tr>
</table> </table>
......
...@@ -259,6 +259,14 @@ trait ClientBase extends Logging { ...@@ -259,6 +259,14 @@ trait ClientBase extends Logging {
localResources localResources
} }
/** Get all application master environment variables set on this SparkConf */
def getAppMasterEnv: Seq[(String, String)] = {
val prefix = "spark.yarn.appMasterEnv."
sparkConf.getAll.filter{case (k, v) => k.startsWith(prefix)}
.map{case (k, v) => (k.substring(prefix.length), v)}
}
def setupLaunchEnv( def setupLaunchEnv(
localResources: HashMap[String, LocalResource], localResources: HashMap[String, LocalResource],
stagingDir: String): HashMap[String, String] = { stagingDir: String): HashMap[String, String] = {
...@@ -276,6 +284,11 @@ trait ClientBase extends Logging { ...@@ -276,6 +284,11 @@ trait ClientBase extends Logging {
distCacheMgr.setDistFilesEnv(env) distCacheMgr.setDistFilesEnv(env)
distCacheMgr.setDistArchivesEnv(env) distCacheMgr.setDistArchivesEnv(env)
getAppMasterEnv.foreach { case (key, value) =>
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
}
// Keep this for backwards compatibility but users should move to the config
sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs => sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
// Allow users to specify some environment variables. // Allow users to specify some environment variables.
YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs, File.pathSeparator) YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs, File.pathSeparator)
......
...@@ -171,7 +171,11 @@ trait ExecutorRunnableUtil extends Logging { ...@@ -171,7 +171,11 @@ trait ExecutorRunnableUtil extends Logging {
val extraCp = sparkConf.getOption("spark.executor.extraClassPath") val extraCp = sparkConf.getOption("spark.executor.extraClassPath")
ClientBase.populateClasspath(null, yarnConf, sparkConf, env, extraCp) ClientBase.populateClasspath(null, yarnConf, sparkConf, env, extraCp)
// Allow users to specify some environment variables sparkConf.getExecutorEnv.foreach { case (key, value) =>
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
}
// Keep this for backwards compatibility but users should move to the config
YarnSparkHadoopUtil.setEnvFromInputString(env, System.getenv("SPARK_YARN_USER_ENV"), YarnSparkHadoopUtil.setEnvFromInputString(env, System.getenv("SPARK_YARN_USER_ENV"),
File.pathSeparator) File.pathSeparator)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment