Skip to content
Snippets Groups Projects
  • Josh Rosen's avatar
    0cfda846
    [SPARK-2313] Use socket to communicate GatewayServer port back to Python driver · 0cfda846
    Josh Rosen authored
    This patch changes PySpark so that the GatewayServer's port is communicated back to the Python process that launches it over a local socket instead of a pipe.  The old pipe-based approach was brittle and could fail if `spark-submit` printed unexpected to stdout.
    
    To accomplish this, I wrote a custom `PythonGatewayServer.main()` function to use in place of Py4J's `GatewayServer.main()`.
    
    Closes #3424.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits:
    
    6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed
    0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies.
    9bdb4b6 [Josh Rosen] Handle case where getListeningPort returns -1
    3fb7ed1 [Josh Rosen] Remove stdout=PIPE
    2458934 [Josh Rosen] Use underscore to mark env var. as private
    d12c95d [Josh Rosen] Use Logging and Utils.tryOrExit()
    e5f9730 [Josh Rosen] Wrap everything in a giant try-block
    2f70689 [Josh Rosen] Use stdin PIPE to share fate with driver
    8bf956e [Josh Rosen] Initial cut at passing Py4J gateway port back to driver via socket
    0cfda846
    History
    [SPARK-2313] Use socket to communicate GatewayServer port back to Python driver
    Josh Rosen authored
    This patch changes PySpark so that the GatewayServer's port is communicated back to the Python process that launches it over a local socket instead of a pipe.  The old pipe-based approach was brittle and could fail if `spark-submit` printed unexpected to stdout.
    
    To accomplish this, I wrote a custom `PythonGatewayServer.main()` function to use in place of Py4J's `GatewayServer.main()`.
    
    Closes #3424.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits:
    
    6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed
    0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies.
    9bdb4b6 [Josh Rosen] Handle case where getListeningPort returns -1
    3fb7ed1 [Josh Rosen] Remove stdout=PIPE
    2458934 [Josh Rosen] Use underscore to mark env var. as private
    d12c95d [Josh Rosen] Use Logging and Utils.tryOrExit()
    e5f9730 [Josh Rosen] Wrap everything in a giant try-block
    2f70689 [Josh Rosen] Use stdin PIPE to share fate with driver
    8bf956e [Josh Rosen] Initial cut at passing Py4J gateway port back to driver via socket