Skip to content
  • Sean Owen's avatar
    79f5f281
    [SPARK-18678][ML] Skewed reservoir sampling in SamplingUtils · 79f5f281
    Sean Owen authored
    ## What changes were proposed in this pull request?
    
    Fix reservoir sampling bias for small k. An off-by-one error meant that the probability of replacement was slightly too high -- k/(l-1) after l element instead of k/l, which matters for small k.
    
    ## How was this patch tested?
    
    Existing test plus new test case.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #16129 from srowen/SPARK-18678.
    79f5f281
    [SPARK-18678][ML] Skewed reservoir sampling in SamplingUtils
    Sean Owen authored
    ## What changes were proposed in this pull request?
    
    Fix reservoir sampling bias for small k. An off-by-one error meant that the probability of replacement was slightly too high -- k/(l-1) after l element instead of k/l, which matters for small k.
    
    ## How was this patch tested?
    
    Existing test plus new test case.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #16129 from srowen/SPARK-18678.
Loading