Skip to content
Snippets Groups Projects
  • Shixiong Zhu's avatar
    ee913e6e
    [SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunctionSerializer.loads · ee913e6e
    Shixiong Zhu authored
    ## What changes were proposed in this pull request?
    
    Set the function's module name to `__main__` if it's missing in `TransformFunctionSerializer.loads`.
    
    ## How was this patch tested?
    
    Manually test in the shell.
    
    Before this patch:
    ```
    >>> from pyspark.streaming import StreamingContext
    >>> from pyspark.streaming.util import TransformFunction
    >>> ssc = StreamingContext(sc, 1)
    >>> func = TransformFunction(sc, lambda x: x, sc.serializer)
    >>> func.rdd_wrapper(lambda x: x)
    TransformFunction(<function <lambda> at 0x106ac8b18>)
    >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers)))
    >>> func2 = ssc._transformerSerializer.loads(bytes)
    >>> print(func2.func.__module__)
    None
    >>> print(func2.rdd_wrap_func.__module__)
    None
    >>>
    ```
    After this patch:
    ```
    >>> from pyspark.streaming import StreamingContext
    >>> from pyspark.streaming.util import TransformFunction
    >>> ssc = StreamingContext(sc, 1)
    >>> func = TransformFunction(sc, lambda x: x, sc.serializer)
    >>> func.rdd_wrapper(lambda x: x)
    TransformFunction(<function <lambda> at 0x108bf1b90>)
    >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers)))
    >>> func2 = ssc._transformerSerializer.loads(bytes)
    >>> print(func2.func.__module__)
    __main__
    >>> print(func2.rdd_wrap_func.__module__)
    __main__
    >>>
    ```
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11535 from zsxwing/loads-module.
    ee913e6e
    History
    [SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunctionSerializer.loads
    Shixiong Zhu authored
    ## What changes were proposed in this pull request?
    
    Set the function's module name to `__main__` if it's missing in `TransformFunctionSerializer.loads`.
    
    ## How was this patch tested?
    
    Manually test in the shell.
    
    Before this patch:
    ```
    >>> from pyspark.streaming import StreamingContext
    >>> from pyspark.streaming.util import TransformFunction
    >>> ssc = StreamingContext(sc, 1)
    >>> func = TransformFunction(sc, lambda x: x, sc.serializer)
    >>> func.rdd_wrapper(lambda x: x)
    TransformFunction(<function <lambda> at 0x106ac8b18>)
    >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers)))
    >>> func2 = ssc._transformerSerializer.loads(bytes)
    >>> print(func2.func.__module__)
    None
    >>> print(func2.rdd_wrap_func.__module__)
    None
    >>>
    ```
    After this patch:
    ```
    >>> from pyspark.streaming import StreamingContext
    >>> from pyspark.streaming.util import TransformFunction
    >>> ssc = StreamingContext(sc, 1)
    >>> func = TransformFunction(sc, lambda x: x, sc.serializer)
    >>> func.rdd_wrapper(lambda x: x)
    TransformFunction(<function <lambda> at 0x108bf1b90>)
    >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers)))
    >>> func2 = ssc._transformerSerializer.loads(bytes)
    >>> print(func2.func.__module__)
    __main__
    >>> print(func2.rdd_wrap_func.__module__)
    __main__
    >>>
    ```
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes #11535 from zsxwing/loads-module.