Scattering matrix with logarithmic axes in R

I am trying to create a scatter matrix from my dataset so that in the resulting matrix:

  • I have two different groups based on
    • Quarter of the year (highlighted as dot color)
    • Type of day (the dot form indicates whether it is a weekend or a random day between Monday and Friday).
  • Logarithmic x and y axes.
  • The values ​​on the axial tick labels are not logarithmic, i.e. values ​​should be displayed on the axes as integers from 0 to 350, and not their log-10.
  • The top panel has correlation values ​​for each quarter.

So far, I have been trying to use functions:

  • couples ()
  • ggpairs () [from GGally package]
  • scatterplotMatrix ()
  • splom ()

But I could not get decent results with these packages, and every time it seems that one or more of my requirements are missing.

  • With pairs () I can create a scatter matrix, but the log = "xy" parameter somehow removes the variable names from the diagonal of the resulting matrix.
  • ggpairs () does not support logarithmic scales directly, but I created a function that passes through the diagonal matrix of the scattering diagram and the lower plane based on this answer. Although logarithmic scaling works on the bottom plane, it will spoil variable labels and tick values.

The function is created and used as follows:

ggpairs_logarithmize <- function(a) { # parameter a is a ggpairs sp-matrix max_limit <- sqrt(length(a$plots)) for(row in 1:max_limit) { # index 1 is used to go through the diagonal also for(col in j:max_limit) { subsp <- getPlot(a,row,col) subspnew <- subsp + scale_y_log10() + scale_x_log10() subspnew$type <- 'logcontinous' subspnew$subType <- 'logpoints' a <- putPlot(a,subspnew,row,col) } } return(a) } scatplot <- ggpairs(...) scatplot_log10 <- ggpairs_logarithmize(scatplot) scatplot_log10 
  • scatterplotMatrix () does not seem to support two groups. I was able to do this separately for the type of season and day, but I need both groups in the same plot.
  • splom () somehow puts the axis tick values ​​also in the logarithmic values, and they should be stored as is (between the integers 0 and 350).

Are there any simple solutions for creating a scatter chart matrix with logarithmic axes with the requirements that I have?

EDIT (13.7.2012): Data and data were requested. Here are some code snippets for creating a demo dataset:

Declare Required Functions

 logarithmize <- function(a) { max_limit <- sqrt(length(a$plots)) for(j in 1:max_limit) { for(i in j:max_limit) { subsp <- getPlot(a,i,j) subspnew <- subsp + scale_y_log10() + scale_x_log10() subspnew$type <- 'logcontinous' subspnew$subType <- 'logpoints' a <- putPlot(a,subspnew,i,j) } } return(a) } add_quarters <- function(a,datecol,targetcol) { for(i in 1:nrow(a)) { month <- 1+as.POSIXlt(as.Date(a[i,datecol]))$mon if ( month <= 3 ) { a[i,targetcol] <- "Q1" } else if (month <= 6 && month > 3) { a[i,targetcol] <- "Q2" } else if ( month <= 9 && month > 6 ) { a[i,targetcol] <- "Q3" } else if ( month > 9 ) { a[i,targetcol] <- "Q4" } } return(a) } 

Create dataset:

 days <- seq.Date(as.Date("2010-01-01"),as.Date("2012-06-06"),"day") bananas <- sample(1:350,length(days), replace=T) apples <- sample(1:350,length(days), replace=T) oranges <- sample(1:350,length(days), replace=T) weekdays <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday") fruitsales <- data.frame(Date=days,Dayofweek=rep(weekdays,length.out=length(days)),Bananas=bananas,Apples=apples,Oranges=oranges) fruitsales[5:6,"Quarter"] <- NA fruitsales[6:7,"Daytype"] <- NA fruitsales$Daytype <- fruitsales$Dayofweek levels(fruitsales$Daytype) # Confirm the day type levels before assigning new levels levels(fruitsales$Daytype) <- c("Casual","Casual","Weekend","Weekend","Casual","Casual","Casual ") fruitsales <- add_quarters(fruitsales,1,6) 

Excecute (NOTE! Windows / Mac users, change x11 () according to what you have)

 # install.packages("GGally") require(GGally) x11(); ggpairs(fruitsales,columns=3:5,colour="Quarter",shape="Daytype") x11(); logarithmize(ggpairs(fruitsales,columns=3:5,colour="Quarter",shape="Daytype")) 
+7
source share
1 answer

The problem with pairs is related to using custom coordinates in the log coordinate system. In particular, when adding labels on diagonals, pairs sets

 par(usr = c(0, 1, 0, 1)) 

however, if you specify the log = "xy" coordinate system through log = "xy" , you will need

 par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE) 

see this post in R help .

This offers the following solution (using the data in question):

 ## adapted from panel.cor in ?pairs panel.cor <- function(x, y, digits=2, cex.cor, quarter, ...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE) r <- rev(tapply(seq_along(quarter), quarter, function(id) cor(x[id], y[id]))) txt <- format(c(0.123456789, r), digits=digits)[-1] txt <- paste(names(txt), txt) if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt) text(0.5, c(0.2, 0.4, 0.6, 0.8), txt) } pairs(fruitsales[,3:5], log = "xy", diag.panel = function(x, ...) par(xlog = FALSE, ylog = FALSE), label.pos = 0.5, col = unclass(factor(fruitsales[,6])), pch = unclass(fruitsales[,7]), upper.panel = panel.cor, quarter = factor(fruitsales[,6])) 

The result is the following graph

pairs plot on log coordinate system

+4
source

All Articles