NbClust Package Error

I try to run the NbClust package according to my data (100 rows x 130 columns) to determine the number of clusters that I should select, but I continue to get this error if I try to apply it to a complete data set:

> nc <- NbClust(mydata, distance="euclidean", min.nc=2, max.nc=99, method="ward", index="duda") [1] "There are only 100 nonmissing observations out of a possible 100 observations." Error in NbClust(mydata, distance = "euclidean", min.nc = 2, max.nc = 99, : The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated. 

When I apply the method to a 100x80 matrix, it produces the required output (100x100 also gave me an error message, but something else). However, obviously, I want to apply this method to the entire data set. FYI - creating a distance matrix, and clustering using the Ward method was no problem. Both the distance matrix and the dendrogram were created ...

+7
r cluster-analysis
source share
3 answers

I am sure I found the reason for this error message, and essentially it is related to the data. I searched the source code for the NbClust package and found that the error came from the initial part of the code:

 NbClust <- function(data, diss="NULL", distance = "euclidean", min.nc=2, max.nc=15, method = "ward", index = "all", alphaBeale = 0.1) { x<-0 min_nc <- min.nc max_nc <- max.nc jeu1 <- as.matrix(data) numberObsBefore <- dim(jeu1)[1] jeu <- na.omit(jeu1) # returns the object with incomplete cases removed nn <- numberObsAfter <- dim(jeu)[1] pp <- dim(jeu)[2] TT <- t(jeu)%*%jeu sizeEigenTT <- length(eigen(TT)$value) eigenValues <- eigen(TT/(nn-1))$value for (i in 1:sizeEigenTT) { if (eigenValues[i] < 0) { print(paste("There are only", numberObsAfter,"nonmissing observations out of a possible", numberObsBefore ,"observations.")) stop("The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.") } } 

So, in my case, my matrix creates negative eigenvalues. I double-checked this, and he: up to about 100 basic submatrices, the eigenvalues ​​remain positive, and then begin to become negative. So, this is a mathematical problem with my matrix, it means that it is not a positive definite matrix. This is important for quite a few reasons - a really good explanation of the reasons and possible solutions is given at http://www2.gsu.edu/~mkteer/npdmatri.html Now I analyze my data to find out what causes this. So, the code is fine: if you get this error message, you probably have to go back to your data.

I would caution against transferring your data, because then you will significantly increase the transfer of transpose data (i.e. source data) with your transposed data. And the original time is transposed NOT the same as the transferred time of the original !!

+6
source share

I do not know what happens with the function, but you can apply diferents methods with a loop: (If you want to apply this code, you need to change "base_muli_sinna")

 lista.methods = c("kl", "ch", "hartigan","mcclain", "gamma", "gplus", "tau", "dunn", "sdindex", "sdbw", "cindex", "silhouette", "ball","ptbiserial", "gap","frey") lista.distance = c("metodo","euclidean", "maximum", "manhattan", "canberra") tabla = as.data.frame(matrix(ncol = length(lista.distance), nrow = length(lista.methods))) names(tabla) = lista.distance for (j in 2:length(lista.distance)){ for(i in 1:length(lista.methods)){ nb = NbClust(base_multi_sinna, distance = lista.distance[j], min.nc = 2, max.nc = 10, method = "complete", index =lista.methods[i]) tabla[i,j] = nb$Best.nc[1] tabla[i,1] = lista.methods[i] }} tabla 
+3
source share

I had the same problem when working with a matrix that has more columns than rows - a problem that might affect other R functions, like princomp, when you try to do PCA analysis (in this case you should use prcomp).

My way to do this in this case is to simply use the transpose matrix:

 NbClust(t(mydata), distance="euclidean", min.nc=2, max.nc=99, method="ward", index="duda") 
+2
source share

All Articles