Skip to content
Snippets Groups Projects
  • hyukjinkwon's avatar
    07fd68a2
    [SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R · 07fd68a2
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to add a wrapper for `unionByName` API to R and Python as well.
    
    **Python**
    
    ```python
    df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
    df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"])
    df1.unionByName(df2).show()
    ```
    
    ```
    +----+----+----+
    |col0|col1|col3|
    +----+----+----+
    |   1|   2|   3|
    |   6|   4|   5|
    +----+----+----+
    ```
    
    **R**
    
    ```R
    df1 <- select(createDataFrame(mtcars), "carb", "am", "gear")
    df2 <- select(createDataFrame(mtcars), "am", "gear", "carb")
    head(unionByName(limit(df1, 2), limit(df2, 2)))
    ```
    
    ```
      carb am gear
    1    4  1    4
    2    4  1    4
    3    4  1    4
    4    4  1    4
    ```
    
    ## How was this patch tested?
    
    Doctests for Python and unit test added in `test_sparkSQL.R` for R.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #19105 from HyukjinKwon/unionByName-r-python.
    07fd68a2
    History
    [SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to add a wrapper for `unionByName` API to R and Python as well.
    
    **Python**
    
    ```python
    df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
    df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"])
    df1.unionByName(df2).show()
    ```
    
    ```
    +----+----+----+
    |col0|col1|col3|
    +----+----+----+
    |   1|   2|   3|
    |   6|   4|   5|
    +----+----+----+
    ```
    
    **R**
    
    ```R
    df1 <- select(createDataFrame(mtcars), "carb", "am", "gear")
    df2 <- select(createDataFrame(mtcars), "am", "gear", "carb")
    head(unionByName(limit(df1, 2), limit(df2, 2)))
    ```
    
    ```
      carb am gear
    1    4  1    4
    2    4  1    4
    3    4  1    4
    4    4  1    4
    ```
    
    ## How was this patch tested?
    
    Doctests for Python and unit test added in `test_sparkSQL.R` for R.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #19105 from HyukjinKwon/unionByName-r-python.