# What is k-Means Cluster Analysis?

Cluster analysis is a method for automatically grouping data into a smaller number of subsets or clusters so that the records grouped are most statistically similar to each other based on the attributes of the data compared.

**"**In statistics and data mining, *k*-means clustering is a method of cluster analysis which aims to partition *n* observations into *k* clusters in which each observation belongs to the cluster with the nearest mean.

Given a set of observations (**x**_{1}, **x**_{2}, …, **x**_{n}), where each observation is a *d*-dimensional real vector, *k*-means clustering aims to partition the *n* observations into *k* sets (*k* ≤ *n*) **S** = {*S*_{1}, *S*_{2}, …, *S*_{k}} so as to minimize the within-cluster sum of squares.

where **μ**_{i} is the mean of points in *S*_{i}.**"** -Wikipedia

[caption id="attachment_16" align="alignright" width="210"]

Scatter chart with coloured clusters and their centroids displayed in red[/caption]
k-Means cluster analysis achieves this by partitioning the data into the required number of clusters by grouping records so that the

euclidean distance between the record's dimensions and the clusters

centroid (point with the average dimensions of the points in the cluster) are as small as possible.

The following is a macro I wrote in VBA for Microsoft Excel that performs k-Means Cluster Analysis on the table selected.

(more…)