Skip to content
Snippets Groups Projects
user avatar
hyukjinkwon authored
## What changes were proposed in this pull request?

This PR make `sample(...)` able to omit `withReplacement` defaulting to `FALSE`.

In short, the following examples are allowed:

```r
> df <- createDataFrame(as.list(seq(10)))
> count(sample(df, fraction=0.5, seed=3))
[1] 4
> count(sample(df, fraction=1.0))
[1] 10
```

In addition, this PR also adds some type checking logics as below:

```r
> sample(df, fraction = "a")
Error in sample(df, fraction = "a") :
  fraction must be numeric; however, got character
> sample(df, fraction = 1, seed = NULL)
Error in sample(df, fraction = 1, seed = NULL) :
  seed must not be NULL or NA; however, got NULL
> sample(df, list(1), 1.0)
Error in sample(df, list(1), 1) :
  withReplacement must be logical; however, got list
> sample(df, fraction = -1.0)
...
Error in sample : illegal argument - requirement failed: Sampling fraction (-1.0) must be on interval [0, 1] without replacement
```

## How was this patch tested?

Manually tested, unit tests added in `R/pkg/tests/fulltests/test_sparkSQL.R`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #19243 from HyukjinKwon/SPARK-21780.
a8d9ec8a
History