How to perform clustering without deleting rows where NA is present in R

I have data that contain some NA values ​​in their elements. What I want to do is to perform clustering without deleting the rows where the HC is present.

I understand that a gower measure of distance in daisy allows such a situation. But why is my code below not working? I welcome alternatives other than chamomile.

 # plot heat map with dendogram together. library("gplots") library("cluster") # Arbitrarily assigning NA to some elements mtcars[2,2] <- "NA" mtcars[6,7] <- "NA" mydata <- mtcars hclustfunc <- function(x) hclust(x, method="complete") # Initially I wanted to use this but it didn't take NA #distfunc <- function(x) dist(x,method="euclidean") # Try using daisy GOWER function # which suppose to work with NA value distfunc <- function(x) daisy(x,metric="gower") d <- distfunc(mydata) fit <- hclustfunc(d) # Perform clustering heatmap heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc); 

I got an error message:

  Error in which(is.na) : argument to 'which' is not logical Calls: distfunc.g -> daisy In addition: Warning messages: 1: In data.matrix(x) : NAs introduced by coercion 2: In data.matrix(x) : NAs introduced by coercion 3: In daisy(x, metric = "gower") : binary variable(s) 8, 9 treated as interval scaled Execution halted 

At the end of the day, I would like to perform hierarchical clustering with allowed NA data.

Update

Conversion using as.numeric works with the above example. But why did this code fail when reading from a text file?

 library("gplots") library("cluster") # This time read from file mtcars <- read.table("http://dpaste.com/1496666/plain/",na.strings="NA",sep="\t") # Following suggestion convert to numeric mydata <- apply( mtcars, 2, as.numeric ) hclustfunc <- function(x) hclust(x, method="complete") #distfunc <- function(x) dist(x,method="euclidean") # Try using daisy GOWER function distfunc <- function(x) daisy(x,metric="gower") d <- distfunc(mydata) fit <- hclustfunc(d) heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc); 

The error I am getting is this:

  Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf Error in hclust(x, method = "complete") : NA/NaN/Inf in foreign function call (arg 11) Calls: hclustfunc -> hclust Execution halted 

~

+7
r cluster-analysis bioconductor
source share
2 answers

The error is related to the presence of non-numeric variables in the data (numbers encoded as strings). You can convert them to numbers:

 mydata <- apply( mtcars, 2, as.numeric ) d <- distfunc(mydata) 
+5
source share

Using as.numeric may help in this case, but I think the original question indicates an error in the daisy function. In particular, it has the following code:

  if (any(ina <- is.na(type3))) stop(gettextf("invalid type %s for column numbers %s", type2[ina], pColl(which(is.na)))) 

The alleged error message is not printed because which(is.na) is incorrect. It should be which(ina) .

I think I should find out where / how to send this error now.

+3
source share

All Articles