-
- Downloads
[SPARK-18678][ML] Skewed reservoir sampling in SamplingUtils
## What changes were proposed in this pull request? Fix reservoir sampling bias for small k. An off-by-one error meant that the probability of replacement was slightly too high -- k/(l-1) after l element instead of k/l, which matters for small k. ## How was this patch tested? Existing test plus new test case. Author: Sean Owen <sowen@cloudera.com> Closes #16129 from srowen/SPARK-18678.
Showing
- R/pkg/inst/tests/testthat/test_mllib.R 5 additions, 4 deletionsR/pkg/inst/tests/testthat/test_mllib.R
- core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala 4 additions, 1 deletion...in/scala/org/apache/spark/util/random/SamplingUtils.scala
- core/src/test/scala/org/apache/spark/util/random/SamplingUtilsSuite.scala 13 additions, 0 deletions...ala/org/apache/spark/util/random/SamplingUtilsSuite.scala
Please register or sign in to comment