Skip to content
Snippets Groups Projects
  • Sean Owen's avatar
    165e06a7
    SPARK-1057 (alternative) Remove fastutil · 165e06a7
    Sean Owen authored
    (This is for discussion at this point -- I'm not suggesting this should be committed.)
    
    This is what removing fastutil looks like. Much of it is straightforward, like using `java.io` buffered stream classes, and Guava for murmurhash3.
    
    Uses of the `FastByteArrayOutputStream` were a little trickier. In only one case though do I think the change to use `java.io` actually entails an extra array copy.
    
    The rest is using `OpenHashMap` and `OpenHashSet`.  These are now written in terms of more scala-like operations.
    
    `OpenHashMap` is where I made three non-trivial changes to make it work, and they need review:
    
    - It is no longer private
    - The key must be a `ClassTag`
    - Unless a lot of other code changes, the key type can't enforce being a supertype of `Null`
    
    It all works and tests pass, and I think there is reason to believe it's OK from a speed perspective.
    
    But what about those last changes?
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #266 from srowen/SPARK-1057-alternate and squashes the following commits:
    
    2601129 [Sean Owen] Fix Map return type error not previously caught
    ec65502 [Sean Owen] Updates from matei's review
    00bc81e [Sean Owen] Remove use of fastutil and replace with use of java.io, spark.util and Guava classes
    165e06a7
    History
    SPARK-1057 (alternative) Remove fastutil
    Sean Owen authored
    (This is for discussion at this point -- I'm not suggesting this should be committed.)
    
    This is what removing fastutil looks like. Much of it is straightforward, like using `java.io` buffered stream classes, and Guava for murmurhash3.
    
    Uses of the `FastByteArrayOutputStream` were a little trickier. In only one case though do I think the change to use `java.io` actually entails an extra array copy.
    
    The rest is using `OpenHashMap` and `OpenHashSet`.  These are now written in terms of more scala-like operations.
    
    `OpenHashMap` is where I made three non-trivial changes to make it work, and they need review:
    
    - It is no longer private
    - The key must be a `ClassTag`
    - Unless a lot of other code changes, the key type can't enforce being a supertype of `Null`
    
    It all works and tests pass, and I think there is reason to believe it's OK from a speed perspective.
    
    But what about those last changes?
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #266 from srowen/SPARK-1057-alternate and squashes the following commits:
    
    2601129 [Sean Owen] Fix Map return type error not previously caught
    ec65502 [Sean Owen] Updates from matei's review
    00bc81e [Sean Owen] Remove use of fastutil and replace with use of java.io, spark.util and Guava classes