-
- Downloads
[SPARK-3277] Fix external spilling with LZ4 assertion error
**Summary of the changes** The bulk of this PR is comprised of tests and documentation; the actual fix is really just adding 1 line of code (see `BlockObjectWriter.scala`). We currently do not run the `External*` test suites with different compression codecs, and this would have caught the bug reported in [SPARK-3277](https://issues.apache.org/jira/browse/SPARK-3277). This PR extends the existing code to test spilling using all compression codecs known to Spark, including `LZ4`. **The bug itself** In `DiskBlockObjectWriter`, we only report the shuffle bytes written before we close the streams. With `LZ4`, all the bytes written reported by our metrics were 0 because `flush()` was not taking effect for some reason. In general, compression codecs may write additional bytes to the file after we call `close()`, and so we must also capture those bytes in our shuffle write metrics. Thanks mridulm and pwendell for help with debugging. Author: Andrew Or <andrewor14@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #2187 from andrewor14/fix-lz4-spilling and squashes the following commits: 1b54bdc [Andrew Or] Speed up tests by not compressing everything 1c4624e [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling 6b2e7d1 [Andrew Or] Fix compilation error 92e251b [Patrick Wendell] Better documentation for BlockObjectWriter. a1ad536 [Andrew Or] Fix tests 089593f [Andrew Or] Actually fix SPARK-3277 (tests still fail) 4bbcf68 [Andrew Or] Update tests to actually test all compression codecs b264a84 [Andrew Or] ExternalAppendOnlyMapSuite code style fixes (minor) 1bfa743 [Andrew Or] Add more information to assert for better debugging
Showing
- core/src/main/scala/org/apache/spark/io/CompressionCodec.scala 1 addition, 0 deletions...src/main/scala/org/apache/spark/io/CompressionCodec.scala
- core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala 29 additions, 8 deletions...in/scala/org/apache/spark/storage/BlockObjectWriter.scala
- core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala 6 additions, 1 deletion.../apache/spark/util/collection/ExternalAppendOnlyMap.scala
- core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala 1 addition, 4 deletions...cala/org/apache/spark/scheduler/ReplayListenerSuite.scala
- core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala 107 additions, 83 deletions...he/spark/util/collection/ExternalAppendOnlyMapSuite.scala
Loading
Please register or sign in to comment