-
- Downloads
[SPARK-13992] Add support for off-heap caching
This patch adds support for caching blocks in the executor processes using direct / off-heap memory. ## User-facing changes **Updated semantics of `OFF_HEAP` storage level**: In Spark 1.x, the `OFF_HEAP` storage level indicated that an RDD should be cached in Tachyon. Spark 2.x removed the external block store API that Tachyon caching was based on (see #10752 / SPARK-12667), so `OFF_HEAP` became an alias for `MEMORY_ONLY_SER`. As of this patch, `OFF_HEAP` means "serialized and cached in off-heap memory or on disk". Via the `StorageLevel` constructor, `useOffHeap` can be set if `serialized == true` and can be used to construct custom storage levels which support replication. **Storage UI reporting**: the storage UI will now report whether in-memory blocks are stored on- or off-heap. **Only supported by UnifiedMemoryManager**: for simplicity, this feature is only supported when the default UnifiedMemoryManager is used; applications which use the legacy memory manager (`spark.memory.useLegacyMode=true`) are not currently able to allocate off-heap storage memory, so using off-heap caching will fail with an error when legacy memory management is enabled. Given that we plan to eventually remove the legacy memory manager, this is not a significant restriction. **Memory management policies:** the policies for dividing available memory between execution and storage are the same for both on- and off-heap memory. For off-heap memory, the total amount of memory available for use by Spark is controlled by `spark.memory.offHeap.size`, which is an absolute size. Off-heap storage memory obeys `spark.memory.storageFraction` in order to control the amount of unevictable storage memory. For example, if `spark.memory.offHeap.size` is 1 gigabyte and Spark uses the default `storageFraction` of 0.5, then up to 500 megabytes of off-heap cached blocks will be protected from eviction due to execution memory pressure. If necessary, we can split `spark.memory.storageFraction` into separate on- and off-heap configurations, but this doesn't seem necessary now and can be done later without any breaking changes. **Use of off-heap memory does not imply use of off-heap execution (or vice-versa)**: for now, the settings controlling the use of off-heap execution memory (`spark.memory.offHeap.enabled`) and off-heap caching are completely independent, so Spark SQL can be configured to use off-heap memory for execution while continuing to cache blocks on-heap. If desired, we can change this in a followup patch so that `spark.memory.offHeap.enabled` affect the default storage level for cached SQL tables. ## Internal changes - Rename `ByteArrayChunkOutputStream` to `ChunkedByteBufferOutputStream` - It now returns a `ChunkedByteBuffer` instead of an array of byte arrays. - Its constructor now accept an `allocator` function which is called to allocate `ByteBuffer`s. This allows us to control whether it allocates regular ByteBuffers or off-heap DirectByteBuffers. - Because block serialization is now performed during the unroll process, a `ChunkedByteBufferOutputStream` which is configured with a `DirectByteBuffer` allocator will use off-heap memory for both unroll and storage memory. - The `MemoryStore`'s MemoryEntries now tracks whether blocks are stored on- or off-heap. - `evictBlocksToFreeSpace()` now accepts a `MemoryMode` parameter so that we don't try to evict off-heap blocks in response to on-heap memory pressure (or vice-versa). - Make sure that off-heap buffers are properly de-allocated during MemoryStore eviction. - The JVM limits the total size of allocated direct byte buffers using the `-XX:MaxDirectMemorySize` flag and the default tends to be fairly low (< 512 megabytes in some JVMs). To work around this limitation, this patch adds a custom DirectByteBuffer allocator which ignores this memory limit. Author: Josh Rosen <joshrosen@databricks.com> Closes #11805 from JoshRosen/off-heap-caching.
Showing
- common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 32 additions, 0 deletions...nsafe/src/main/java/org/apache/spark/unsafe/Platform.java
- core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala 4 additions, 4 deletions...n/scala/org/apache/spark/broadcast/TorrentBroadcast.scala
- core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala 10 additions, 12 deletions...ain/scala/org/apache/spark/memory/StorageMemoryPool.scala
- core/src/main/scala/org/apache/spark/scheduler/Task.scala 3 additions, 2 deletionscore/src/main/scala/org/apache/spark/scheduler/Task.scala
- core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala 4 additions, 12 deletions...scala/org/apache/spark/serializer/SerializerManager.scala
- core/src/main/scala/org/apache/spark/storage/BlockManager.scala 47 additions, 23 deletions...rc/main/scala/org/apache/spark/storage/BlockManager.scala
- core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 1 addition, 1 deletion...org/apache/spark/storage/BlockManagerMasterEndpoint.scala
- core/src/main/scala/org/apache/spark/storage/StorageLevel.scala 9 additions, 12 deletions...rc/main/scala/org/apache/spark/storage/StorageLevel.scala
- core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala 109 additions, 44 deletions...n/scala/org/apache/spark/storage/memory/MemoryStore.scala
- core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala 10 additions, 4 deletions...in/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
- core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStream.scala 26 additions, 12 deletions.../apache/spark/util/io/ChunkedByteBufferOutputStream.scala
- core/src/test/scala/org/apache/spark/DistributedSuite.scala 2 additions, 2 deletionscore/src/test/scala/org/apache/spark/DistributedSuite.scala
- core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala 1 addition, 1 deletion...st/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala
- core/src/test/scala/org/apache/spark/memory/MemoryManagerSuite.scala 2 additions, 1 deletion...st/scala/org/apache/spark/memory/MemoryManagerSuite.scala
- core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala 19 additions, 5 deletions...g/apache/spark/storage/BlockManagerReplicationSuite.scala
- core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala 25 additions, 7 deletions...st/scala/org/apache/spark/storage/BlockManagerSuite.scala
- core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala 11 additions, 11 deletions...est/scala/org/apache/spark/storage/MemoryStoreSuite.scala
- core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala 26 additions, 21 deletions...he/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala
- streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala 2 additions, 1 deletion...che/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala
- streaming/src/test/scala/org/apache/spark/streaming/ReceivedBlockHandlerSuite.scala 2 additions, 1 deletion...rg/apache/spark/streaming/ReceivedBlockHandlerSuite.scala
Loading
Please register or sign in to comment