Commit b454d440 authored 8 years ago by Shubham Chopra Committed by Wenchen Fan 8 years ago

[SPARK-15354][CORE] Topology aware block replication strategies

## What changes were proposed in this pull request?

Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack.
The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds.

The prioritization itself can be thought of an optimization problem to find a minimal set of peers that satisfy certain objectives and replicating to these peers first. The objectives can be used to express richer constraints over and above HDFS like 3-replica strategy.
## How was this patch tested?

This patch was tested with unit tests for storage, along with new unit tests to verify prioritization behaviour.

Author: Shubham Chopra <schopra31@bloomberg.net>

Closes #13932 from shubhamchopra/PrioritizerStrategy.

parent edc87d76

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 222 additions and 32 deletions

Please register or to comment