Skip to content
Snippets Groups Projects
  1. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  2. Jul 06, 2016
  3. Jun 03, 2016
    • Davies Liu's avatar
      [SPARK-15391] [SQL] manage the temporary memory of timsort · 3074f575
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      Currently, the memory for temporary buffer used by TimSort is always allocated as on-heap without bookkeeping, it could cause OOM both in on-heap and off-heap mode.
      
      This PR will try to manage that by preallocate it together with the pointer array, same with RadixSort. It both works for on-heap and off-heap mode.
      
      This PR also change the loadFactor of BytesToBytesMap to 0.5 (it was 0.70), it enables use to radix sort also makes sure that we have enough memory for timsort.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #13318 from davies/fix_timsort.
      3074f575
  4. May 29, 2016
    • Sean Owen's avatar
      [MINOR] Resolve a number of miscellaneous build warnings · ce1572d1
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      This change resolves a number of build warnings that have accumulated, before 2.x. It does not address a large number of deprecation warnings, especially related to the Accumulator API. That will happen separately.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13377 from srowen/BuildWarnings.
      ce1572d1
  5. May 17, 2016
  6. Apr 28, 2016
  7. Apr 26, 2016
    • Azeem Jiva's avatar
      [SPARK-14756][CORE] Use parseLong instead of valueOf · de6e6334
      Azeem Jiva authored
      ## What changes were proposed in this pull request?
      
      Use Long.parseLong which returns a primative.
      Use a series of appends() reduces the creation of an extra StringBuilder type
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: Azeem Jiva <azeemj@gmail.com>
      
      Closes #12520 from javawithjiva/minor.
      de6e6334
  8. Apr 01, 2016
    • Josh Rosen's avatar
      [SPARK-13992] Add support for off-heap caching · e41acb75
      Josh Rosen authored
      This patch adds support for caching blocks in the executor processes using direct / off-heap memory.
      
      ## User-facing changes
      
      **Updated semantics of `OFF_HEAP` storage level**: In Spark 1.x, the `OFF_HEAP` storage level indicated that an RDD should be cached in Tachyon. Spark 2.x removed the external block store API that Tachyon caching was based on (see #10752 / SPARK-12667), so `OFF_HEAP` became an alias for `MEMORY_ONLY_SER`. As of this patch, `OFF_HEAP` means "serialized and cached in off-heap memory or on disk". Via the `StorageLevel` constructor, `useOffHeap` can be set if `serialized == true` and can be used to construct custom storage levels which support replication.
      
      **Storage UI reporting**: the storage UI will now report whether in-memory blocks are stored on- or off-heap.
      
      **Only supported by UnifiedMemoryManager**: for simplicity, this feature is only supported when the default UnifiedMemoryManager is used; applications which use the legacy memory manager (`spark.memory.useLegacyMode=true`) are not currently able to allocate off-heap storage memory, so using off-heap caching will fail with an error when legacy memory management is enabled. Given that we plan to eventually remove the legacy memory manager, this is not a significant restriction.
      
      **Memory management policies:** the policies for dividing available memory between execution and storage are the same for both on- and off-heap memory. For off-heap memory, the total amount of memory available for use by Spark is controlled by `spark.memory.offHeap.size`, which is an absolute size. Off-heap storage memory obeys `spark.memory.storageFraction` in order to control the amount of unevictable storage memory. For example, if `spark.memory.offHeap.size` is 1 gigabyte and Spark uses the default `storageFraction` of 0.5, then up to 500 megabytes of off-heap cached blocks will be protected from eviction due to execution memory pressure. If necessary, we can split `spark.memory.storageFraction` into separate on- and off-heap configurations, but this doesn't seem necessary now and can be done later without any breaking changes.
      
      **Use of off-heap memory does not imply use of off-heap execution (or vice-versa)**: for now, the settings controlling the use of off-heap execution memory (`spark.memory.offHeap.enabled`) and off-heap caching are completely independent, so Spark SQL can be configured to use off-heap memory for execution while continuing to cache blocks on-heap. If desired, we can change this in a followup patch so that `spark.memory.offHeap.enabled` affect the default storage level for cached SQL tables.
      
      ## Internal changes
      
      - Rename `ByteArrayChunkOutputStream` to `ChunkedByteBufferOutputStream`
        - It now returns a `ChunkedByteBuffer` instead of an array of byte arrays.
        - Its constructor now accept an `allocator` function which is called to allocate `ByteBuffer`s. This allows us to control whether it allocates regular ByteBuffers or off-heap DirectByteBuffers.
        - Because block serialization is now performed during the unroll process, a `ChunkedByteBufferOutputStream` which is configured with a `DirectByteBuffer` allocator will use off-heap memory for both unroll and storage memory.
      - The `MemoryStore`'s MemoryEntries now tracks whether blocks are stored on- or off-heap.
        - `evictBlocksToFreeSpace()` now accepts a `MemoryMode` parameter so that we don't try to evict off-heap blocks in response to on-heap memory pressure (or vice-versa).
      - Make sure that off-heap buffers are properly de-allocated during MemoryStore eviction.
      - The JVM limits the total size of allocated direct byte buffers using the `-XX:MaxDirectMemorySize` flag and the default tends to be fairly low (< 512 megabytes in some JVMs). To work around this limitation, this patch adds a custom DirectByteBuffer allocator which ignores this memory limit.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11805 from JoshRosen/off-heap-caching.
      e41acb75
  9. Mar 29, 2016
  10. Mar 21, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule · 20fd2541
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      [Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables **LineLength** checkstyle again. To help that, this also introduces **RedundantImport** and **RedundantModifier**, too. The following is the diff on `checkstyle.xml`.
      
      ```xml
      -        <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places -->
      -        <!--
               <module name="LineLength">
                   <property name="max" value="100"/>
                   <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/>
               </module>
      -        -->
               <module name="NoLineWrap"/>
               <module name="EmptyBlock">
                   <property name="option" value="TEXT"/>
       -167,5 +164,7
               </module>
               <module name="CommentsIndentation"/>
               <module name="UnusedImports"/>
      +        <module name="RedundantImport"/>
      +        <module name="RedundantModifier"/>
      ```
      
      ## How was this patch tested?
      
      Currently, `lint-java` is disabled in Jenkins. It needs a manual test.
      After passing the Jenkins tests, `dev/lint-java` should passes locally.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11831 from dongjoon-hyun/SPARK-14011.
      20fd2541
  11. Mar 16, 2016
    • Sean Owen's avatar
      [SPARK-13823][SPARK-13397][SPARK-13395][CORE] More warnings, StandardCharset follow up · 3b461d9e
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Follow up to https://github.com/apache/spark/pull/11657
      
      - Also update `String.getBytes("UTF-8")` to use `StandardCharsets.UTF_8`
      - And fix one last new Coverity warning that turned up (use of unguarded `wait()` replaced by simpler/more robust `java.util.concurrent` classes in tests)
      - And while we're here cleaning up Coverity warnings, just fix about 15 more build warnings
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11725 from srowen/SPARK-13823.2.
      3b461d9e
  12. Mar 13, 2016
    • Sean Owen's avatar
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <->... · 18408528
      Sean Owen authored
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
      
      ## What changes were proposed in this pull request?
      
      - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8
      - Same for `InputStreamReader` and `OutputStreamWriter` constructors
      - Standardizes on UTF-8 everywhere
      - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`)
      - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c )
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11657 from srowen/SPARK-13823.
      18408528
  13. Mar 09, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13692][CORE][SQL] Fix trivial Coverity/Checkstyle defects · f3201aee
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue fixes the following potential bugs and Java coding style detected by Coverity and Checkstyle.
      
      - Implement both null and type checking in equals functions.
      - Fix wrong type casting logic in SimpleJavaBean2.equals.
      - Add `implement Cloneable` to `UTF8String` and `SortedIterator`.
      - Remove dereferencing before null check in `AbstractBytesToBytesMapSuite`.
      - Fix coding style: Add '{}' to single `for` statement in mllib examples.
      - Remove unused imports in `ColumnarBatch` and `JavaKinesisStreamSuite`.
      - Remove unused fields in `ChunkFetchIntegrationSuite`.
      - Add `stop()` to prevent resource leak.
      
      Please note that the last two checkstyle errors exist on newly added commits after [SPARK-13583](https://issues.apache.org/jira/browse/SPARK-13583).
      
      ## How was this patch tested?
      
      manual via `./dev/lint-java` and Coverity site.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11530 from dongjoon-hyun/SPARK-13692.
      f3201aee
  14. Mar 03, 2016
    • Sean Owen's avatar
      [SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.x · e97fc7f1
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly:
      
      - Inner class should be static
      - Mismatched hashCode/equals
      - Overflow in compareTo
      - Unchecked warnings
      - Misuse of assert, vs junit.assert
      - get(a) + getOrElse(b) -> getOrElse(a,b)
      - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions
      - Dead code
      - tailrec
      - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count
      - reduce(_+_) -> sum map + flatten -> map
      
      The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places.
      
      ## How was the this patch tested?
      
      Existing Jenkins unit tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11292 from srowen/SPARK-13423.
      e97fc7f1
  15. Mar 01, 2016
    • Reynold Xin's avatar
      [SPARK-13548][BUILD] Move tags and unsafe modules into common · b0ee7d43
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch moves tags and unsafe modules into common directory to remove 2 top level non-user-facing directories.
      
      ## How was this patch tested?
      Jenkins should suffice.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11426 from rxin/SPARK-13548.
      b0ee7d43
Loading