Skip to content
Snippets Groups Projects
Commit 7786f9cc authored by Burak Yavuz's avatar Burak Yavuz Committed by Tathagata Das
Browse files

[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks

The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway.

However, when you have many small files on S3, recovery takes very long. In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore deletes can also be parallelized.

This PR adds support for the two parallelization steps mentioned above, in addition to a couple more failures I encountered during recovery.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #9373 from brkyvz/par-recovery.
parent 0f1d00a9
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment