Correlation between multiple data frame variables

Question

Correlation between multiple data frame variables

I have data.frame of 10 variables in R Let's call them var1 var2 ... var10

I want to find the correlation of one of var1 with respect to var2 , var3 ... var10

How can we do this?

Function

cor can find the correlation between two variables at a time. Using this, I had to write a cor function for each analysis

+5

r correlation

Milind kumar Jul 24 '16 at 5:09

source share

2 answers

Simon jackson · Answer 1 · 2016-07-25T02:33:28+0000

My corrr package, which helps explore correlations, has a simple solution for this. I will use the mtcars dataset as an example and say that we want to focus on mpg correlation with all other variables.

 install.packages("corrr") # though keep eye out for new version coming soon library(corrr) mtcars %>% correlate() %>% focus(mpg) #> rowname mpg #> <chr> <dbl> #> 1 cyl -0.8521620 #> 2 disp -0.8475514 #> 3 hp -0.7761684 #> 4 drat 0.6811719 #> 5 wt -0.8676594 #> 6 qsec 0.4186840 #> 7 vs 0.6640389 #> 8 am 0.5998324 #> 9 gear 0.4802848 #> 10 carb -0.5509251

Here correlate() creates a frame of correlation data, and focus() allows you to focus on the correlations of certain variables with everyone else.

FYI, focus() works similarly to select() from the dplyr package, except that it modifies rows as well as columns. Therefore, if you are familiar with select() , you should easily use focus() . For instance:.

 mtcars %>% correlate() %>% focus(mpg:drat) #> rowname mpg cyl disp hp drat #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 #> 2 qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 #> 3 vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 #> 4 am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 #> 5 gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 #> 6 carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980

thisisrg · Answer 2 · 2016-07-25T15:58:53+0000

Another way would be to use the Hmisc and corrplot libraries to get correlations between all pairs, significance and a pretty plot:

 #Your data frame (4 variables instead of 10) df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100)) #setup library(Hmisc) library(corrplot) df<-scale(df)# normalize the data frame. This will also convert the df to a matrix. corr<-rcorr(df) # compute Pearson (or spearman corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read. corr_r<-as.matrix(corr[[1]])# Access the correlation matrix. corr_r[,1]# subset the correlation of "a" (=var1 ) with the rest if you want. pval<-as.matrix(corr[[3]])# get the p-values corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"

Correlation between multiple data frame variables

More articles: