R Programming/Clustering
Appearance
This section is a stub. You can help Wikibooks by expanding it. |
Basic clustering
[edit | edit source]K-Means Clustering
[edit | edit source]You can use the kmeans()
function.
First create some data:
> dat <- matrix(rnorm(100), nrow=10, ncol=10)
To apply kmeans()
, you need to specify the number of clusters:
> cl <- kmeans(dat, 3) # here 3 is the number of clusters
> table(cl$cluster)
1 2 3
38 44 18
Hierarchical Clustering
[edit | edit source]The basic hierarchical clustering function is hclust()
, which works on a dissimilarity structure as produced by the dist()
function:
> hc <- hclust(dist(dat)) # data matrix from the example above
> plot(hc)
The resulting tree can be cut using the cutree()
function.
Cutting it at a given height:
> cl <- cutree(hc, h=5.1)
> table(cl)
cl
1 2 3 4 5
23 33 29 4 11
Cutting it to obtain given number of clusters:
> cl <- cutree(hc, k=5)
> table(cl)
cl
1 2 3 4 5
23 33 29 4 11
Available alternatives
[edit | edit source]- See packages class, amap and cluster
- See The R bioinformatic page on clustering