Graph or graph for correlation matrix

I tried to make a graph from the correlation matrix and have three colors to represent the correlation coefficients using the library lattice.

library(lattice) levelplot(cor) 

I get the following chart:

Plot of correlation matrix

The plot is only for a subset of the data that I had. When I use the entire dataset (400X400), then it becomes unclear and the coloring does not display correctly and displays as points. Is it possible to get the same in the form of tiles for a large matrix?

I tried using the pheatmap function, but I do not want my values ​​to be grouped, and just want to clearly represent the high and low values ​​as a tile.

+6
source share
2 answers

@ Lucas gives good advice here, since corrplot is quite useful for visualizing correlation matrices. However, it does not address the initial problem of constructing a large correlation matrix. In fact, corrplot will also fail when trying to visualize this large correlation matrix. For a simple solution, you may need to reduce the number of variables. That is, I would suggest looking at the relationship between a subset of variables that, as you know, are important for your problem. Trying to understand the correlation structure of many variables will be a daunting task (even if you can visualize it)!

0
source

If you want to make a correlation graph, use the corrplot library, as it has great flexibility to create diagrams similar to shapes for correlation.

 library(corrplot) #create data with some correlation structure jnk=runif(1000) jnk=(jnk*100)+c(1:500, 500:1) jnk=matrix(jnk,nrow=100,ncol=10) jnk=as.data.frame(jnk) names(jnk)=c("var1", "var2","var3","var4","var5","var6","var7","var8","var9","var10") #create correlation matrix cor_jnk=cor(jnk, use="complete.obs") #plot cor matrix corrplot(cor_jnk, order="AOE", method="circle", tl.pos="lt", type="upper", tl.col="black", tl.cex=0.6, tl.srt=45, addCoef.col="black", addCoefasPercent = TRUE, p.mat = 1-abs(cor_jnk), sig.level=0.50, insig = "blank") 

enter image description here The above code only adds color to correlations with correlations> abs (0.5), but you can easily change this. Finally, there are many ways in which you can customize the appearance of the plot (changing the color gradient, displaying correlations, displaying the full version with only a half matrix, etc.). The order argument is especially useful because it allows you to order your variables in a PCA-based correlation matrix, so they are ordered based on similarities in the correlation.

For example, for squares (similar to your original plot) - just change the method to squares: enter image description here

EDIT: @ Carson. You can still use this method for reasonable large correlation matrices: for example, 100 variable matrices below. In addition, I do not see what the use of a graphical representation of the correlation matrix with so many variables is without any subset, because it will be very difficult to interpret. enter image description here

+12
source

All Articles