-
- Downloads
[SPARK-1991] Support custom storage levels for vertices and edges
This PR adds support for specifying custom storage levels for the vertices and edges of a graph. This enables GraphX to handle graphs larger than memory size by specifying MEMORY_AND_DISK and then repartitioning the graph to use many small partitions, each of which does fit in memory. Spark will then automatically load partitions from disk as needed. The user specifies the desired vertex and edge storage levels when building the graph by passing them to the graph constructor. These are then stored in the `targetStorageLevel` attribute of the VertexRDD and EdgeRDD respectively. Whenever GraphX needs to cache a VertexRDD or EdgeRDD (because it plans to use it more than once, for example), it uses the specified target storage level. Also, when the user calls `Graph#cache()`, the vertices and edges are persisted using their target storage levels. In order to facilitate propagating the target storage levels across VertexRDD and EdgeRDD operations, we remove raw calls to the constructors and instead introduce the `withPartitionsRDD` and `withTargetStorageLevel` methods. I tested this change by running PageRank and triangle count on a severely memory-constrained cluster (1 executor with 300 MB of memory, and a 1 GB graph). Before this PR, these algorithms used to fail with OutOfMemoryErrors. With this PR, and using the DISK_ONLY storage level, they succeed. Author: Ankur Dave <ankurdave@gmail.com> Closes #946 from ankurdave/SPARK-1991 and squashes the following commits: ce17d95 [Ankur Dave] Move pickStorageLevel to StorageLevel.fromString ccaf06f [Ankur Dave] Shadow members in withXYZ() methods rather than using underscores c34abc0 [Ankur Dave] Exclude all of GraphX from compatibility checks vs. 1.0.0 c5ca068 [Ankur Dave] Revert "Exclude all of GraphX from binary compatibility checks" 34bcefb [Ankur Dave] Exclude all of GraphX from binary compatibility checks 6fdd137 [Ankur Dave] [SPARK-1991] Support custom storage levels for vertices and edges
Showing
- core/src/main/scala/org/apache/spark/storage/StorageLevel.scala 21 additions, 0 deletions...rc/main/scala/org/apache/spark/storage/StorageLevel.scala
- graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala 64 additions, 3 deletionsgraphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
- graphx/src/main/scala/org/apache/spark/graphx/Graph.scala 24 additions, 10 deletionsgraphx/src/main/scala/org/apache/spark/graphx/Graph.scala
- graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala 9 additions, 3 deletions.../src/main/scala/org/apache/spark/graphx/GraphLoader.scala
- graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala 39 additions, 10 deletions...hx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala 29 additions, 26 deletions...c/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/ReplicatedVertexView.scala 3 additions, 3 deletions...a/org/apache/spark/graphx/impl/ReplicatedVertexView.scala
- graphx/src/main/scala/org/apache/spark/graphx/lib/Analytics.scala 40 additions, 42 deletions...rc/main/scala/org/apache/spark/graphx/lib/Analytics.scala
Loading
Please register or sign in to comment