Skip to content
  • felixcheung's avatar
    ecd877e8
    [SPARK-12224][SPARKR] R support for JDBC source · ecd877e8
    felixcheung authored
    Add R API for `read.jdbc`, `write.jdbc`.
    
    Tested this quite a bit manually with different combinations of parameters. It's not clear if we could have automated tests in R for this - Scala `JDBCSuite` depends on Java H2 in-memory database.
    
    Refactored some code into util so they could be tested.
    
    Core's R SerDe code needs to be updated to allow access to java.util.Properties as `jobj` handle which is required by DataFrameReader/Writer's `jdbc` method. It would be possible, though more code to add a `sql/r/SQLUtils` helper function.
    
    Tested:
    ```
    # with postgresql
    ../bin/sparkR --driver-class-path /usr/share/java/postgresql-9.4.1207.jre7.jar
    
    # read.jdbc
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345")
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = 12345)
    
    # partitionColumn and numPartitions test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, numPartitions = 4, user = "user", password = 12345)
    a <- SparkR:::toRDD(df)
    SparkR:::getNumPartitions(a)
    [1] 4
    SparkR:::collectPartition(a, 2L)
    
    # defaultParallelism test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, user = "user", password = 12345)
    SparkR:::getNumPartitions(a)
    [1] 2
    
    # predicates test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", predicates = list("did<=105"), user = "user", password = 12345)
    count(df) == 1
    
    # write.jdbc, default save mode "error"
    irisDf <- as.DataFrame(sqlContext, iris)
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345")
    "error, already exists"
    
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "iris", user = "user", password = "12345")
    ```
    
    Author: felixcheung <felixcheung_m@hotmail.com>
    
    Closes #10480 from felixcheung/rreadjdbc.
    ecd877e8
    [SPARK-12224][SPARKR] R support for JDBC source
    felixcheung authored
    Add R API for `read.jdbc`, `write.jdbc`.
    
    Tested this quite a bit manually with different combinations of parameters. It's not clear if we could have automated tests in R for this - Scala `JDBCSuite` depends on Java H2 in-memory database.
    
    Refactored some code into util so they could be tested.
    
    Core's R SerDe code needs to be updated to allow access to java.util.Properties as `jobj` handle which is required by DataFrameReader/Writer's `jdbc` method. It would be possible, though more code to add a `sql/r/SQLUtils` helper function.
    
    Tested:
    ```
    # with postgresql
    ../bin/sparkR --driver-class-path /usr/share/java/postgresql-9.4.1207.jre7.jar
    
    # read.jdbc
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345")
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = 12345)
    
    # partitionColumn and numPartitions test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, numPartitions = 4, user = "user", password = 12345)
    a <- SparkR:::toRDD(df)
    SparkR:::getNumPartitions(a)
    [1] 4
    SparkR:::collectPartition(a, 2L)
    
    # defaultParallelism test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, user = "user", password = 12345)
    SparkR:::getNumPartitions(a)
    [1] 2
    
    # predicates test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", predicates = list("did<=105"), user = "user", password = 12345)
    count(df) == 1
    
    # write.jdbc, default save mode "error"
    irisDf <- as.DataFrame(sqlContext, iris)
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345")
    "error, already exists"
    
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "iris", user = "user", password = "12345")
    ```
    
    Author: felixcheung <felixcheung_m@hotmail.com>
    
    Closes #10480 from felixcheung/rreadjdbc.
Loading