-
- Downloads
[SPARK-20109][MLLIB] Rewrote toBlockMatrix method on IndexedRowMatrix
## What changes were proposed in this pull request? - ~~I added the method `toBlockMatrixDense` to the IndexedRowMatrix class. The current implementation of `toBlockMatrix` is insufficient for users with relatively dense IndexedRowMatrix objects, since it assumes sparsity.~~ EDIT: Ended up deciding that there should be just a single `toBlockMatrix` method, which creates a BlockMatrix whose blocks may be dense or sparse depending on the sparsity of the rows. This method will work better on any current use case of `toBlockMatrix` and doesn't go through `CoordinateMatrix` like the old method. ## How was this patch tested? ~~I used the same tests already written for `toBlockMatrix()` to test this method. I also added a new additional unit test for an edge case that was not adequately tested by current test suite.~~ I ran the original `IndexedRowMatrix` tests, plus wrote more to better handle edge cases ignored by original tests. Author: John Compitello <johnc@broadinstitute.org> Closes #17459 from johnc1231/johnc-fix-ir-to-block.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/CoordinateMatrix.scala 7 additions, 0 deletions...che/spark/mllib/linalg/distributed/CoordinateMatrix.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala 66 additions, 4 deletions...che/spark/mllib/linalg/distributed/IndexedRowMatrix.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala 84 additions, 7 deletions...park/mllib/linalg/distributed/IndexedRowMatrixSuite.scala
Please register or sign in to comment