Exercise 8.1

Consider the USArrests data. We will now perform hierarchical clustering on the states.

data("USArrests", package = "datasets")  # "package" argument optional here
set.seed(2)
  1. Using hierarchical clustering with complete linkage and Euclidean distance, cluster the states.
hc.complete <-  hclust(dist(USArrests), method = "complete")
plot(hc.complete)

  1. Cut the dendrogram at a height that results in three distinct clusters. Which states belong to which clusters?
cutree(hc.complete, 3)
##        Alabama         Alaska        Arizona       Arkansas     California 
##              1              1              1              2              1 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##              2              3              1              1              2 
##         Hawaii          Idaho       Illinois        Indiana           Iowa 
##              3              3              1              3              3 
##         Kansas       Kentucky      Louisiana          Maine       Maryland 
##              3              3              1              3              1 
##  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
##              2              1              3              1              2 
##        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##              3              3              1              3              2 
##     New Mexico       New York North Carolina   North Dakota           Ohio 
##              1              1              1              3              3 
##       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
##              2              2              3              2              1 
##   South Dakota      Tennessee          Texas           Utah        Vermont 
##              3              2              2              3              3 
##       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
##              2              2              3              3              2
table(cutree(hc.complete, 3))
## 
##  1  2  3 
## 16 14 20
  1. Hierarchically cluster the states using complete linkage and Euclidean distance, after scaling the variables to have standard deviation one.
dsc <-  scale(USArrests)
hc.s.complete <-  hclust(dist(dsc), method = "complete")
plot(hc.s.complete)