k-Means Cluster Analysis in Microsoft Excel

What is k-Means Cluster Analysis?

Cluster analysis is a method for automatically grouping data into a smaller number of subsets or clusters so that the records grouped are most statistically similar to each other based on the attributes of the data compared.

In statistics and data miningk-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

Given a set of observations (x1x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ nS = {S1S2, …, Sk} so as to minimize the within-cluster sum of squares.

\underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x_j \in S_i} \left\| \mathbf x_j - \boldsymbol\mu_i \right\|^2

where μi is the mean of points in Si.Wikipedia 

Scatter Chart Clusters and Centroids

Scatter chart with coloured clusters and their centroids displayed in red

k-Means cluster analysis achieves this by partitioning the data into the required number of clusters by grouping records so that the euclidean distance between the record’s dimensions and the clusters centroid (point with the average dimensions of the points in the cluster) are as small as possible.

The following is a macro I wrote in VBA for Microsoft Excel that performs k-Means Cluster Analysis on the table selected.

Continue reading