Jump to content

R Programming/Clustering

From Wikibooks, open books for an open world

Basic clustering

[edit | edit source]

K-Means Clustering

[edit | edit source]

You can use the kmeans() function.

First create some data:

> dat <- matrix(rnorm(100), nrow=10, ncol=10)

To apply kmeans(), you need to specify the number of clusters:

> cl <- kmeans(dat, 3) # here 3 is the number of clusters
> table(cl$cluster)
 1  2  3 
38 44 18

Hierarchical Clustering

[edit | edit source]

The basic hierarchical clustering function is hclust(), which works on a dissimilarity structure as produced by the dist() function:

> hc <- hclust(dist(dat)) # data matrix from the example above
> plot(hc)

The resulting tree can be cut using the cutree() function.

Cutting it at a given height:

> cl <- cutree(hc, h=5.1)
> table(cl)
cl
 1  2  3  4  5 
23 33 29  4 11

Cutting it to obtain given number of clusters:

> cl <- cutree(hc, k=5)
> table(cl)
cl
 1  2  3  4  5 
23 33 29  4 11

Available alternatives

[edit | edit source]

References

[edit | edit source]
[edit | edit source]