How can I predict a new data cluster after clustering training data?

I am new to R, and I have already trained the model with hclust:

 model=hclust(distances,method="ward")

And the result looks good:

enter image description here

Now I get new data records, I want to predict which cluster each of them belongs to. How to do it?

+5
source share
5 answers

Clustering should not "classify" new data, as the name implies - this is the basic concept of classification .

(, , - kmeans, kmedians ..) "" . , - ​​ , "" , , .

"" hclust "" , hclust. , knn ( k = 1) hclust .

+7

, LDA , .

+1

.

  1. R hclust .
  2. .
  3. , .
  4. , KS, AUC .., .

PCA PC1 .

  1. , .
  2. , .

R , PCA , hclust. (Mayank 2016) , , . , .

Mayank. 2016. "Hclust() R ". . hclust() R .

0

, , class :: knn, , .

KNN k- , , . , . , .

.

library(scorecard)
library(factoextra)
library(class)

df_iris <- split_df(iris, ratio = 0.75, seed = 123)
d_iris <- dist(scale(df_iris$train[,-5]))

hc_iris <- hclust(d_iris, method = "ward.D2")
fviz_dend(hc_iris, k = 3,cex = 0.5,k_colors = c("#00AFBB","#E7B800","#FC4E07"),
          color_labels_by_k = TRUE, ggtheme = theme_minimal())
groups <- cutree(hc_iris, k = 3)
table(groups)

enter image description here

knnClust <- knn(train = df_iris$train[,-5], test = df_iris$test[,-5] , k = 1, cl = groups)
knnClust
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 3 3 3 2 2 2 2 2 3 3 2 2 3 2 2 2 2 2 2 2 2 2
Levels: 1 2 3

# p1 <- fviz_cluster(list(data = df_iris$train[,-5], cluster = groups), stand = F) + xlim(-11.2,-4.8) + ylim(-3,3) + ggtitle("train")
# p2 <- fviz_cluster(list(data = df_iris$test[,-5], cluster = knnClust),stand = F) + xlim(-11.2,-4.8) + ylim(-3,3) + ggtitle("test")
# gridExtra::grid.arrange(p1,p2,nrow = 2)

pca1 <- data.frame(prcomp(df_iris$train[,-5], scale. = T)$x[,1:2], cluster = as.factor(groups), factor = "train")
pca2 <- data.frame(prcomp(df_iris$test[,-5], scale. = T)$x[,1:2], cluster = as.factor(knnClust), factor = "test")
pca <- as.data.frame(rbind(pca1,pca2))

ggplot(pca, aes(x = PC1, y = PC2, color = cluster, size = 1, alpha = factor)) +
  geom_point(shape = 19) + theme_bw()

enter image description here

0

hclust, , ?

knn in the class will only look at the nearest n and will only allow Euclidean distance.

There is no need to run the classifier.

-3
source

All Articles