[SPARK-22119][ML] Add cosine distance to KMeans
## What changes were proposed in this pull request? Currently, KMeans assumes the only possible distance measure to be used is the Euclidean. This PR aims to add the cosine distance support to the KMeans algorithm. ## How was this patch tested? existing and added UTs. Author: Marco Gaido <marcogaido91@gmail.com> Author: Marco Gaido <mgaido@hortonworks.com> Closes #19340 from mgaido91/SPARK-22119.
Showing
- mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 19 additions, 3 deletions...rc/main/scala/org/apache/spark/ml/clustering/KMeans.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala 6 additions, 5 deletions...a/org/apache/spark/mllib/clustering/BisectingKMeans.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 174 additions, 42 deletions...main/scala/org/apache/spark/mllib/clustering/KMeans.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala 65 additions, 9 deletions...scala/org/apache/spark/mllib/clustering/KMeansModel.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/LocalKMeans.scala 7 additions, 3 deletions...scala/org/apache/spark/mllib/clustering/LocalKMeans.scala
- mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala 40 additions, 2 deletions...st/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala 4 additions, 2 deletions...scala/org/apache/spark/mllib/clustering/KMeansSuite.scala
Loading
Please register or sign in to comment