Skip to content
Snippets Groups Projects
  1. Nov 04, 2013
  2. Nov 03, 2013
    • Reynold Xin's avatar
      Merge pull request #70 from rxin/hash1 · b5dc3393
      Reynold Xin authored
      Fast, memory-efficient hash set, hash table implementations optimized for primitive data types.
      
      This pull request adds two hash table implementations optimized for primitive data types. For primitive types, the new hash tables are much faster than the current Spark AppendOnlyMap (3X faster - note that the current AppendOnlyMap is already much better than the Java map) while uses much less space (1/4 of the space).
      
      Details:
      
      This PR first adds a open hash set implementation (OpenHashSet) optimized for primitive types (using Scala's specialization feature). This OpenHashSet is designed to serve as building blocks for more advanced structures. It is currently used to build the following two hash tables, but can be used in the future to build multi-valued hash tables as well (GraphX has this use case). Note that there are some peculiarities in the code for working around some Scala compiler bugs.
      
      Building on top of OpenHashSet, this PR adds two different hash tables implementations:
      1. OpenHashSet: for nullable keys, optional specialization for primitive values
      2. PrimitiveKeyOpenHashMap: for primitive keys that are not nullable, and optional specialization for primitive values
      
      I tested the update speed of these two implementations using the changeValue function (which is what Aggregator and cogroup would use). Runtime relative to AppendOnlyMap for inserting 10 million items:
      
      Int to Int: ~30%
      java.lang.Integer to java.lang.Integer: ~100%
      Int to java.lang.Integer: ~50%
      java.lang.Integer to Int: ~85%
      b5dc3393
    • Reynold Xin's avatar
      Code review feedback. · eb5f8a3f
      Reynold Xin authored
      eb5f8a3f
    • Reynold Xin's avatar
      Fixed a bug that uses twice amount of memory for the primitive arrays due to a scala compiler bug. · 1e9543b5
      Reynold Xin authored
      Also addressed Matei's code review comment.
      1e9543b5
    • Reynold Xin's avatar
      Merge branch 'master' into hash1 · da6bb0ae
      Reynold Xin authored
      da6bb0ae
  3. Nov 02, 2013
  4. Nov 01, 2013
  5. Oct 30, 2013
  6. Oct 29, 2013
  7. Oct 28, 2013
  8. Oct 27, 2013
  9. Oct 26, 2013
Loading