Skip to content
Snippets Groups Projects
Commit 0368ff30 authored by pshearer's avatar pshearer Committed by Sean Owen
Browse files

[SPARK-13973][PYSPARK] Make pyspark fail noisily if IPYTHON or IPYTHON_OPTS are set

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-13973

Following discussion with srowen the IPYTHON and IPYTHON_OPTS variables are removed. If they are set in the user's environment, pyspark will not execute and prints an error message. Failing noisily will force users to remove these options and learn the new configuration scheme, which is much more sustainable and less confusing.

## How was this patch tested?

Manual testing; set IPYTHON=1 and verified that the error message prints.

Author: pshearer <pshearer@massmutual.com>
Author: shearerp <shearerp@umich.edu>

Closes #12528 from shearerp/master.
parent 8dc3987d
No related branches found
No related tags found
No related merge requests found
...@@ -24,17 +24,11 @@ fi ...@@ -24,17 +24,11 @@ fi
source "${SPARK_HOME}"/bin/load-spark-env.sh source "${SPARK_HOME}"/bin/load-spark-env.sh
export _SPARK_CMD_USAGE="Usage: ./bin/pyspark [options]" export _SPARK_CMD_USAGE="Usage: ./bin/pyspark [options]"
# In Spark <= 1.1, setting IPYTHON=1 would cause the driver to be launched using the `ipython` # In Spark 2.0, IPYTHON and IPYTHON_OPTS are removed and pyspark fails to launch if either option
# executable, while the worker would still be launched using PYSPARK_PYTHON. # is set in the user's environment. Instead, users should set PYSPARK_DRIVER_PYTHON=ipython
# # to use IPython and set PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver
# In Spark 1.2, we removed the documentation of the IPYTHON and IPYTHON_OPTS variables and added
# PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS to allow IPython to be used for the driver.
# Now, users can simply set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set
# PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver
# (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython # (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython
# and executor Python executables. # and executor Python executables.
#
# For backwards-compatibility, we retain the old IPYTHON and IPYTHON_OPTS variables.
# Determine the Python executable to use if PYSPARK_PYTHON or PYSPARK_DRIVER_PYTHON isn't set: # Determine the Python executable to use if PYSPARK_PYTHON or PYSPARK_DRIVER_PYTHON isn't set:
if hash python2.7 2>/dev/null; then if hash python2.7 2>/dev/null; then
...@@ -44,17 +38,15 @@ else ...@@ -44,17 +38,15 @@ else
DEFAULT_PYTHON="python" DEFAULT_PYTHON="python"
fi fi
# Determine the Python executable to use for the driver: # Fail noisily if removed options are set
if [[ -n "$IPYTHON_OPTS" || "$IPYTHON" == "1" ]]; then if [[ -n "$IPYTHON" || -n "$IPYTHON_OPTS" ]]; then
# If IPython options are specified, assume user wants to run IPython echo "Error in pyspark startup:"
# (for backwards-compatibility) echo "IPYTHON and IPYTHON_OPTS are removed in Spark 2.0+. Remove these from the environment and set PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS instead."
PYSPARK_DRIVER_PYTHON_OPTS="$PYSPARK_DRIVER_PYTHON_OPTS $IPYTHON_OPTS" exit 1
if [ -x "$(command -v jupyter)" ]; then fi
PYSPARK_DRIVER_PYTHON="jupyter"
else # Default to standard python interpreter unless told otherwise
PYSPARK_DRIVER_PYTHON="ipython" if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
fi
elif [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}" PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}"
fi fi
......
...@@ -240,16 +240,17 @@ use IPython, set the `PYSPARK_DRIVER_PYTHON` variable to `ipython` when running ...@@ -240,16 +240,17 @@ use IPython, set the `PYSPARK_DRIVER_PYTHON` variable to `ipython` when running
$ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark $ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark
{% endhighlight %} {% endhighlight %}
You can customize the `ipython` command by setting `PYSPARK_DRIVER_PYTHON_OPTS`. For example, to launch To use the Jupyter notebook (previously known as the IPython notebook),
the [IPython Notebook](http://ipython.org/notebook.html) with PyLab plot support:
{% highlight bash %} {% highlight bash %}
$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" ./bin/pyspark $ PYSPARK_DRIVER_PYTHON=jupyter ./bin/pyspark
{% endhighlight %} {% endhighlight %}
After the IPython Notebook server is launched, you can create a new "Python 2" notebook from You can customize the `ipython` or `jupyter` commands by setting `PYSPARK_DRIVER_PYTHON_OPTS`.
After the Jupyter Notebook server is launched, you can create a new "Python 2" notebook from
the "Files" tab. Inside the notebook, you can input the command `%pylab inline` as part of the "Files" tab. Inside the notebook, you can input the command `%pylab inline` as part of
your notebook before you start to try Spark from the IPython notebook. your notebook before you start to try Spark from the Jupyter notebook.
</div> </div>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment