Hi community stackoverflow,
I am running kmeans (statistics package) and Kmeans (amap package) in the Iris dataset. In both cases, I use the same algorithm (Lloyd-Forgy), the same distance (Euclidean), the same number of initial random sets (50), the same maximum number of iterations (1000), and I test the same set from k values (from 2 to 15). I also use the same seed for both cases (4358).
I do not understand why under these conditions I have different wss curves, in particular: the "elbow" using the statistics package is much less accented than when using the amap package.
Could you help me understand why? Thank you very much!
Here is the code:
# data load and scaling newiris <- iris newiris$Species <- NULL newiris <- scale(newiris) # using kmeans (stats) wss1 <- (nrow(newiris)-1)*sum(apply(newiris,2,var)) for (i in 2:15) { set.seed(4358) wss1[i] <- sum(kmeans(newiris, centers=i, iter.max=1000, nstart=50, algorithm="Lloyd")$withinss) } # using Kmeans (amap) library(amap) wss2 <- (nrow(newiris)-1)*sum(apply(newiris,2,var)) for (i in 2:15) { set.seed(4358) wss2[i] <- sum(Kmeans(newiris, centers=i, iter.max=1000, nstart=50, method="euclidean")$withinss) } # plots plot(1:15, wss1, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares", main="kmeans (stats package)") plot(1:15, wss2, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares", main="Kmeans (amap package)")
EDIT: I emailed the author of the amap package and will post the answer when / if I receive it. https://cran.r-project.org/web/packages/amap/index.html