Skip to content
Snippets Groups Projects
ml-clustering.md 6.70 KiB
layout: global
title: Clustering
displayTitle: Clustering

This page describes clustering algorithms in MLlib. The guide for clustering in the RDD-based API also has relevant information about these algorithms.

Table of Contents

  • This will become a table of contents (this text will be scraped). {:toc}

K-means

k-means is one of the most commonly used clustering algorithms that clusters the data points into a predefined number of clusters. The MLlib implementation includes a parallelized variant of the k-means++ method called kmeans||.

KMeans is implemented as an Estimator and generates a KMeansModel as the base model.

Input Columns

Param name Type(s) Default Description
featuresCol Vector "features" Feature vector

Output Columns