Skip to content
Snippets Groups Projects
  • hyukjinkwon's avatar
    2bfd5acc
    [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json · 2bfd5acc
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR supports schema in a DDL formatted string for `from_json` in R/Python and `dapply` and `gapply` in R, which are commonly used and/or consistent with Scala APIs.
    
    Additionally, this PR exposes `structType` in R to allow working around in other possible corner cases.
    
    **Python**
    
    `from_json`
    
    ```python
    from pyspark.sql.functions import from_json
    
    data = [(1, '''{"a": 1}''')]
    df = spark.createDataFrame(data, ("key", "value"))
    df.select(from_json(df.value, "a INT").alias("json")).show()
    ```
    
    **R**
    
    `from_json`
    
    ```R
    df <- sql("SELECT named_struct('name', 'Bob') as people")
    df <- mutate(df, people_json = to_json(df$people))
    head(select(df, from_json(df$people_json, "name STRING")))
    ```
    
    `structType.character`
    
    ```R
    structType("a STRING, b INT")
    ```
    
    `dapply`
    
    ```R
    dapply(createDataFrame(list(list(1.0)), "a"), function(x) {x}, "a DOUBLE")
    ```
    
    `gapply`
    
    ```R
    gapply(createDataFrame(list(list(1.0)), "a"), "a", function(key, x) { x }, "a DOUBLE")
    ```
    
    ## How was this patch tested?
    
    Doc tests for `from_json` in Python and unit tests `test_sparkSQL.R` in R.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #18498 from HyukjinKwon/SPARK-21266.
    2bfd5acc
    History
    [SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR supports schema in a DDL formatted string for `from_json` in R/Python and `dapply` and `gapply` in R, which are commonly used and/or consistent with Scala APIs.
    
    Additionally, this PR exposes `structType` in R to allow working around in other possible corner cases.
    
    **Python**
    
    `from_json`
    
    ```python
    from pyspark.sql.functions import from_json
    
    data = [(1, '''{"a": 1}''')]
    df = spark.createDataFrame(data, ("key", "value"))
    df.select(from_json(df.value, "a INT").alias("json")).show()
    ```
    
    **R**
    
    `from_json`
    
    ```R
    df <- sql("SELECT named_struct('name', 'Bob') as people")
    df <- mutate(df, people_json = to_json(df$people))
    head(select(df, from_json(df$people_json, "name STRING")))
    ```
    
    `structType.character`
    
    ```R
    structType("a STRING, b INT")
    ```
    
    `dapply`
    
    ```R
    dapply(createDataFrame(list(list(1.0)), "a"), function(x) {x}, "a DOUBLE")
    ```
    
    `gapply`
    
    ```R
    gapply(createDataFrame(list(list(1.0)), "a"), "a", function(key, x) { x }, "a DOUBLE")
    ```
    
    ## How was this patch tested?
    
    Doc tests for `from_json` in Python and unit tests `test_sparkSQL.R` in R.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #18498 from HyukjinKwon/SPARK-21266.
test_sparkSQL.R 125.56 KiB