Skip to content
  • hyukjinkwon's avatar
    5dca10b8
    [SPARK-21193][PYTHON] Specify Pandas version in setup.py · 5dca10b8
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    It looks we missed specifying the Pandas version. This PR proposes to fix it. For the current state, it should be Pandas 0.13.0 given my test. This PR propose to fix it as 0.13.0.
    
    Running the codes below:
    
    ```python
    from pyspark.sql.types import *
    
    schema = StructType().add("a", IntegerType()).add("b", StringType())\
                         .add("c", BooleanType()).add("d", FloatType())
    data = [
        (1, "foo", True, 3.0,), (2, "foo", True, 5.0),
        (3, "bar", False, -1.0), (4, "bar", False, 6.0),
    ]
    spark.createDataFrame(data, schema).toPandas().dtypes
    ```
    
    prints ...
    
    **With Pandas 0.13.0** - released, 2014-01
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.12.0** -  - released, 2013-06
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.11.0** - released, 2013-03
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.10.0** -  released, 2012-12
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int64  # <- this should be 'int32'
    b     object
    c       bool
    d    float64  # <- this should be 'float32'
    ```
    
    ## How was this patch tested?
    
    Manually tested with Pandas from 0.10.0 to 0.13.0.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #18403 from HyukjinKwon/SPARK-21193.
    5dca10b8
    [SPARK-21193][PYTHON] Specify Pandas version in setup.py
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    It looks we missed specifying the Pandas version. This PR proposes to fix it. For the current state, it should be Pandas 0.13.0 given my test. This PR propose to fix it as 0.13.0.
    
    Running the codes below:
    
    ```python
    from pyspark.sql.types import *
    
    schema = StructType().add("a", IntegerType()).add("b", StringType())\
                         .add("c", BooleanType()).add("d", FloatType())
    data = [
        (1, "foo", True, 3.0,), (2, "foo", True, 5.0),
        (3, "bar", False, -1.0), (4, "bar", False, 6.0),
    ]
    spark.createDataFrame(data, schema).toPandas().dtypes
    ```
    
    prints ...
    
    **With Pandas 0.13.0** - released, 2014-01
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.12.0** -  - released, 2013-06
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.11.0** - released, 2013-03
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.10.0** -  released, 2012-12
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int64  # <- this should be 'int32'
    b     object
    c       bool
    d    float64  # <- this should be 'float32'
    ```
    
    ## How was this patch tested?
    
    Manually tested with Pandas from 0.10.0 to 0.13.0.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #18403 from HyukjinKwon/SPARK-21193.
Loading