I am trying to get the Pearson correlation coefficient for all rows in a data frame relative to each other. there are values that are empty (NA), and this seems to pose a problem that I don't encounter when running cor () on 2 vectors with missing values. This is the correct result for 2 vectors:
x <- c(NA, 4.5, NA, 4, NA, 1)
y <- c(2.5, 3.5, 3, 3.5, 3, 2.5)
cor(x,y, use = "complete.obs")
[1] 0.9912407
and here is the result when they are part of the data frame:
cor(t(critics1), use = "complete.obs")
y a b c d e x
y 1 NA NA NA NA NA NA
a NA 1 1 1 -1 1 -1
b NA 1 1 1 -1 1 -1
c NA 1 1 1 -1 1 -1
d NA -1 -1 -1 1 -1 1
e NA 1 1 1 -1 1 -1
x NA -1 -1 -1 1 -1 1
Warning message:
In cor(t(critics1), use = "complete.obs") : the standard deviation is zero
Why doesn't the use parameter have the same effect? Here is what critics1 dataframs looks like;
film1 film2 film3 film4 film5 film6
y 2.5 3.5 3.0 3.5 3.0 2.5
a 3.0 3.5 1.5 5.0 3.0 3.5
b 2.5 3.0 NA 3.5 4.0 NA
c NA 3.5 3.0 4.0 4.5 2.5
d 3.0 4.0 2.0 3.0 3.0 2.0
e 3.0 4.0 NA 5.0 3.0 3.5
x NA 4.5 NA 4.0 NA 1.0
source
share