Skip to content
Snippets Groups Projects
  • Nicholas Chammas's avatar
    5d16d5bb
    [SPARK-2470] PEP8 fixes to PySpark · 5d16d5bb
    Nicholas Chammas authored
    This pull request aims to resolve all outstanding PEP8 violations in PySpark.
    
    Author: Nicholas Chammas <nicholas.chammas@gmail.com>
    Author: nchammas <nicholas.chammas@gmail.com>
    
    Closes #1505 from nchammas/master and squashes the following commits:
    
    98171af [Nicholas Chammas] [SPARK-2470] revert PEP 8 fixes to cloudpickle
    cba7768 [Nicholas Chammas] [SPARK-2470] wrap expression list in parentheses
    e178dbe [Nicholas Chammas] [SPARK-2470] style - change position of line break
    9127d2b [Nicholas Chammas] [SPARK-2470] wrap expression lists in parentheses
    22132a4 [Nicholas Chammas] [SPARK-2470] wrap conditionals in parentheses
    24639bc [Nicholas Chammas] [SPARK-2470] fix whitespace for doctest
    7d557b7 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to tests.py
    8f8e4c0 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to storagelevel.py
    b3b96cf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to statcounter.py
    d644477 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to worker.py
    aa3a7b6 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to sql.py
    1916859 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to shell.py
    95d1d95 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to serializers.py
    a0fec2e [Nicholas Chammas] [SPARK-2470] PEP8 fixes to mllib
    c85e1e5 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to join.py
    d14f2f1 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to __init__.py
    81fcb20 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to resultiterable.py
    1bde265 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to java_gateway.py
    7fc849c [Nicholas Chammas] [SPARK-2470] PEP8 fixes to daemon.py
    ca2d28b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to context.py
    f4e0039 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to conf.py
    a6d5e4b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to cloudpickle.py
    f0a7ebf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to rddsampler.py
    4dd148f [nchammas] Merge pull request #5 from apache/master
    f7e4581 [Nicholas Chammas] unrelated pep8 fix
    a36eed0 [Nicholas Chammas] name ec2 instances and security groups consistently
    de7292a [nchammas] Merge pull request #4 from apache/master
    2e4fe00 [nchammas] Merge pull request #3 from apache/master
    89fde08 [nchammas] Merge pull request #2 from apache/master
    69f6e22 [Nicholas Chammas] PEP8 fixes
    2627247 [Nicholas Chammas] broke up lines before they hit 100 chars
    6544b7e [Nicholas Chammas] [SPARK-2065] give launched instances names
    69da6cf [nchammas] Merge pull request #1 from apache/master
    5d16d5bb
    History
    [SPARK-2470] PEP8 fixes to PySpark
    Nicholas Chammas authored
    This pull request aims to resolve all outstanding PEP8 violations in PySpark.
    
    Author: Nicholas Chammas <nicholas.chammas@gmail.com>
    Author: nchammas <nicholas.chammas@gmail.com>
    
    Closes #1505 from nchammas/master and squashes the following commits:
    
    98171af [Nicholas Chammas] [SPARK-2470] revert PEP 8 fixes to cloudpickle
    cba7768 [Nicholas Chammas] [SPARK-2470] wrap expression list in parentheses
    e178dbe [Nicholas Chammas] [SPARK-2470] style - change position of line break
    9127d2b [Nicholas Chammas] [SPARK-2470] wrap expression lists in parentheses
    22132a4 [Nicholas Chammas] [SPARK-2470] wrap conditionals in parentheses
    24639bc [Nicholas Chammas] [SPARK-2470] fix whitespace for doctest
    7d557b7 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to tests.py
    8f8e4c0 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to storagelevel.py
    b3b96cf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to statcounter.py
    d644477 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to worker.py
    aa3a7b6 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to sql.py
    1916859 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to shell.py
    95d1d95 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to serializers.py
    a0fec2e [Nicholas Chammas] [SPARK-2470] PEP8 fixes to mllib
    c85e1e5 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to join.py
    d14f2f1 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to __init__.py
    81fcb20 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to resultiterable.py
    1bde265 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to java_gateway.py
    7fc849c [Nicholas Chammas] [SPARK-2470] PEP8 fixes to daemon.py
    ca2d28b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to context.py
    f4e0039 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to conf.py
    a6d5e4b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to cloudpickle.py
    f0a7ebf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to rddsampler.py
    4dd148f [nchammas] Merge pull request #5 from apache/master
    f7e4581 [Nicholas Chammas] unrelated pep8 fix
    a36eed0 [Nicholas Chammas] name ec2 instances and security groups consistently
    de7292a [nchammas] Merge pull request #4 from apache/master
    2e4fe00 [nchammas] Merge pull request #3 from apache/master
    89fde08 [nchammas] Merge pull request #2 from apache/master
    69f6e22 [Nicholas Chammas] PEP8 fixes
    2627247 [Nicholas Chammas] broke up lines before they hit 100 chars
    6544b7e [Nicholas Chammas] [SPARK-2065] give launched instances names
    69da6cf [nchammas] Merge pull request #1 from apache/master
java_gateway.py 3.86 KiB
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import os
import sys
import signal
import shlex
import platform
from subprocess import Popen, PIPE
from threading import Thread
from py4j.java_gateway import java_import, JavaGateway, GatewayClient


def launch_gateway():
    SPARK_HOME = os.environ["SPARK_HOME"]

    gateway_port = -1
    if "PYSPARK_GATEWAY_PORT" in os.environ:
        gateway_port = int(os.environ["PYSPARK_GATEWAY_PORT"])
    else:
        # Launch the Py4j gateway using Spark's run command so that we pick up the
        # proper classpath and settings from spark-env.sh
        on_windows = platform.system() == "Windows"
        script = "./bin/spark-submit.cmd" if on_windows else "./bin/spark-submit"
        submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS")
        submit_args = submit_args if submit_args is not None else ""
        submit_args = shlex.split(submit_args)
        command = [os.path.join(SPARK_HOME, script), "pyspark-shell"] + submit_args
        if not on_windows:
            # Don't send ctrl-c / SIGINT to the Java gateway:
            def preexec_func():
                signal.signal(signal.SIGINT, signal.SIG_IGN)
            proc = Popen(command, stdout=PIPE, stdin=PIPE, preexec_fn=preexec_func)
        else:
            # preexec_fn not supported on Windows
            proc = Popen(command, stdout=PIPE, stdin=PIPE)

        try:
            # Determine which ephemeral port the server started on:
            gateway_port = proc.stdout.readline()
            gateway_port = int(gateway_port)
        except ValueError:
            (stdout, _) = proc.communicate()
            exit_code = proc.poll()
            error_msg = "Launching GatewayServer failed"
            error_msg += " with exit code %d! " % exit_code if exit_code else "! "
            error_msg += "(Warning: unexpected output detected.)\n\n"
            error_msg += gateway_port + stdout
            raise Exception(error_msg)

        # Create a thread to echo output from the GatewayServer, which is required
        # for Java log output to show up:
        class EchoOutputThread(Thread):
            def __init__(self, stream):
                Thread.__init__(self)
                self.daemon = True
                self.stream = stream

            def run(self):
                while True:
                    line = self.stream.readline()
                    sys.stderr.write(line)
        EchoOutputThread(proc.stdout).start()

    # Connect to the gateway
    gateway = JavaGateway(GatewayClient(port=gateway_port), auto_convert=False)

    # Import the classes used by PySpark
    java_import(gateway.jvm, "org.apache.spark.SparkConf")
    java_import(gateway.jvm, "org.apache.spark.api.java.*")
    java_import(gateway.jvm, "org.apache.spark.api.python.*")
    java_import(gateway.jvm, "org.apache.spark.mllib.api.python.*")
    java_import(gateway.jvm, "org.apache.spark.sql.SQLContext")
    java_import(gateway.jvm, "org.apache.spark.sql.hive.HiveContext")
    java_import(gateway.jvm, "org.apache.spark.sql.hive.LocalHiveContext")
    java_import(gateway.jvm, "org.apache.spark.sql.hive.TestHiveContext")
    java_import(gateway.jvm, "scala.Tuple2")

    return gateway