-
- Downloads
[SPARK-15354][CORE] Topology aware block replication strategies
## What changes were proposed in this pull request? Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack. The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds. The prioritization itself can be thought of an optimization problem to find a minimal set of peers that satisfy certain objectives and replicating to these peers first. The objectives can be used to express richer constraints over and above HDFS like 3-replica strategy. ## How was this patch tested? This patch was tested with unit tests for storage, along with new unit tests to verify prioritization behaviour. Author: Shubham Chopra <schopra31@bloomberg.net> Closes #13932 from shubhamchopra/PrioritizerStrategy.
Showing
- core/src/main/scala/org/apache/spark/storage/BlockManager.scala 0 additions, 3 deletions...rc/main/scala/org/apache/spark/storage/BlockManager.scala
- core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala 128 additions, 17 deletions...ala/org/apache/spark/storage/BlockReplicationPolicy.scala
- core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala 31 additions, 2 deletions...g/apache/spark/storage/BlockManagerReplicationSuite.scala
- core/src/test/scala/org/apache/spark/storage/BlockReplicationPolicySuite.scala 63 additions, 10 deletions...rg/apache/spark/storage/BlockReplicationPolicySuite.scala
Please register or sign in to comment