Skip to content
Snippets Groups Projects
  • hyukjinkwon's avatar
    a8d9ec8a
    [SPARK-21780][R] Simpler Dataset.sample API in R · a8d9ec8a
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR make `sample(...)` able to omit `withReplacement` defaulting to `FALSE`.
    
    In short, the following examples are allowed:
    
    ```r
    > df <- createDataFrame(as.list(seq(10)))
    > count(sample(df, fraction=0.5, seed=3))
    [1] 4
    > count(sample(df, fraction=1.0))
    [1] 10
    ```
    
    In addition, this PR also adds some type checking logics as below:
    
    ```r
    > sample(df, fraction = "a")
    Error in sample(df, fraction = "a") :
      fraction must be numeric; however, got character
    > sample(df, fraction = 1, seed = NULL)
    Error in sample(df, fraction = 1, seed = NULL) :
      seed must not be NULL or NA; however, got NULL
    > sample(df, list(1), 1.0)
    Error in sample(df, list(1), 1) :
      withReplacement must be logical; however, got list
    > sample(df, fraction = -1.0)
    ...
    Error in sample : illegal argument - requirement failed: Sampling fraction (-1.0) must be on interval [0, 1] without replacement
    ```
    
    ## How was this patch tested?
    
    Manually tested, unit tests added in `R/pkg/tests/fulltests/test_sparkSQL.R`.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #19243 from HyukjinKwon/SPARK-21780.
    a8d9ec8a
    History
    [SPARK-21780][R] Simpler Dataset.sample API in R
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR make `sample(...)` able to omit `withReplacement` defaulting to `FALSE`.
    
    In short, the following examples are allowed:
    
    ```r
    > df <- createDataFrame(as.list(seq(10)))
    > count(sample(df, fraction=0.5, seed=3))
    [1] 4
    > count(sample(df, fraction=1.0))
    [1] 10
    ```
    
    In addition, this PR also adds some type checking logics as below:
    
    ```r
    > sample(df, fraction = "a")
    Error in sample(df, fraction = "a") :
      fraction must be numeric; however, got character
    > sample(df, fraction = 1, seed = NULL)
    Error in sample(df, fraction = 1, seed = NULL) :
      seed must not be NULL or NA; however, got NULL
    > sample(df, list(1), 1.0)
    Error in sample(df, list(1), 1) :
      withReplacement must be logical; however, got list
    > sample(df, fraction = -1.0)
    ...
    Error in sample : illegal argument - requirement failed: Sampling fraction (-1.0) must be on interval [0, 1] without replacement
    ```
    
    ## How was this patch tested?
    
    Manually tested, unit tests added in `R/pkg/tests/fulltests/test_sparkSQL.R`.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #19243 from HyukjinKwon/SPARK-21780.