Save yaxis legends as a separate coffin?

Question

Save yaxis legends as a separate coffin?

I have a very large scatter chart of two categories, where the dot is the “hit”. I wanted to make histograms at the top and side of the plot to present them as shown on the following website: http://blog.mckuhn.de/2009/09/learning-ggplot2-2d-plot-with.html

I can arrange the graphs as a “2 by 2” grid, but I ran into a problem: “The x-ray of my primary diffused screen has very long headers (important for the project), and in the 2x2 grid the top histogram extends to its full width and no longer aligns along the axis x.

My thought was to create a 3x3 grid, where I use the leftmost grid for the names. However, this requires saving the text of the Y axis as "grob". In the blog above, this is achieved as follows:

p <- qplot(data = mtcars, mpg, hp, geom = "point", colour = cyl) legend <- p + opts(keep= "legend_box")

this allows you to mark a "legend" in a 2x2 grid layout. If I could use the same logic to make a separate game for Yaxis shortcuts, everything would be fine. I have tried at least the following:

 legend <- p +opts(keep="Yaxis") legend <- p +opts(keep="axis_text_y") legend <- p +opts(keep="axis_text") ..... and many others

Is it possible to make a coffin from things besides a box of legends? If so, please let me know. If not, I will accept any suggestions on how to arrange the three plots, keeping them aligned and keeping Y labels.

thanks

Image showing how labels is affecting vertical alignment and why I want to capture the yaxis text

+8

r ggplot2

zach Oct 7 '11 at 13:56

source share

1 answer

Dinre · Accepted Answer · 2013-03-26T16:42:22+0000

This question sits long enough that it is time to document the answer for posterity.

The short answer is that personalized data visualizations cannot be performed using function wrappers from the pound and ggplot2 packages. The purpose of a function wrapper is to make some decisions out of your hands, so you will always be limited to the solutions that were originally provided by the function encoder. I highly recommend everyone to explore the pound or ggplot2 packages, but these packages are more useful for exploring data than for creativity with data visualization.

This answer is for those who want to create a personalized visual. The following process can take half a day, but it is much less time than it would take to crack the "grate" or "grate", ggplot2 'into the form you need. This is not a criticism of any of these packages; it’s just a byproduct of their goal. When you need a creative visual for a publication or client, 4 or 5 hours of your day is nothing compared to winning.

The job of creating a custom visual image is pretty simple with the grid package, but that doesn't mean that the math behind it is always simple. Most of the work in this example is actually math, not graphics.

Foreword Before you work with the basic grid package for your visual effects, you need to know some things. Firstly, the "grid" works with the idea of viewports. This is a construction of spaces that allow you to reference within this space, ignoring the rest of the chart. This is important because it allows you to make graphics without having to scale your work in a fraction of the entire space. This is very similar to the layout options in the base build functions, except that they can overlap, rotate, and be transparent.

Units are another thing to know. Each viewport has many units that can be used to indicate positions and sizes. You can see the whole list in the "grid" documentation, but there are only a few that I use very often: npc, native, strwidth and lines. Npc blocks start with (0,0) in the lower left corner and go to c (1,1) in the upper right corner. Native units use xscale and yscale to create what is essentially a conspiracy space for data. The strwidth blocks tell you how wide a specific line of text will be printed on the chart. Linear units tell you how a tall line of text will be printed on the chart. Since there are several types of units available, you should always have either an explicit definition of the number using the "unit" function, or the argument "default.units" from your drawing functions.

Finally, you have the opportunity to provide justifications for all the locations of your objects. This is HUGE. This means that you can indicate the location of the figure, and then say how you want this shape to be horizontally and vertically justified (in the center, left, right, bottom, top). You can arrange everything in this way by specifying the location of other objects.

This is what we do . This is not perfect graphics, since I need to guess what the OP wants, but that’s enough to get to the perfect graphic.

Step 1 Download some libraries to work. When you want to make highly customizable visual effects, use the "grid" package. This is a basic set of functions that call wrappers, such as "pound" and "ggplot2". When you want to work with dates, use the "lubridate" package because the IT SITE IS YOUR LIFE BETTER. This is my last personal preference: when I am going to do any work with data, I like to use the "plyr" package. This allows me to quickly form my data in aggregate forms.

 library(grid) library(lubridate) library(plyr)

Test data generation . This is not necessary if you already have data, but for this example I am creating a sample dataset. You can play with it by changing the user settings for data generation. The script is flexible and adapts to the generated data. Feel free to add additional sites and play with lambda values.

  set.seed(1) ############################################# # User settings for the data generation. # ############################################# # Set number of hours to generate data for. time_Periods <- 100 # Set starting datetime in m/d/yyyy hh:mm format. start_Datetime <- "2/24/2013 00:00" # Specify a list of websites along with a # Poisson lambda to represent the average # number of hits in a given time period. df_Websites <- read.table(text=" url lambda http://www.asitenoonereallyvisits.com 1 http://www.asitesomepeoplevisit.com 10 http://www.asitesomemorepeoplevisit.com 20 http://www.asiteevenmorepeoplevisit.com 40 http://www.asiteeveryonevisits.com 80 ", header=TRUE, sep=" ") ############################################# # Generate the data. # ############################################# # Initialize lists to hold hit data and # website names. hits <- list() websites <- list() # For each time period and for each website, # flip a coin to see if any visitors come. If # visitors come, use a Poisson distribution to # see how many come. # Also initialize the list of website names. for (i in 1:nrow(df_Websites)){ hits[[i]] <- rbinom(time_Periods, 1, 0.5) * rpois(time_Periods, df_Websites$lambda[i]) websites[[i]] <- rep(df_Websites$url[i], time_Periods) } # Initialize list of time periods. datetimes <- mdy_hm(start_Datetime) + hours(1:time_Periods) # Tie the data into a data frame and erase rows with no hits. # This is what the real data is more likely to look like # after import and cleaning. df_Hits <- data.frame(datetime=rep(datetimes, nrow(df_Websites)), hits=unlist(hits), website=unlist(websites)) df_Hits <- df_Hits[df_Hits$hits > 0,] # Clean up data-generation variables. rm(list=ls()[ls()!="df_Hits"])

Step 2 : Now we need to decide how we want our graphics to work. It’s useful to separate things like sizes and colors into another section of your code so that you can make changes quickly. Here I selected some basic settings that should create decent graphics. You will notice that some of the size options use the "unit" function. This is one of the amazing things about the grid. You can use various units to describe the space on the chart. For example, unit(1, "lines") is the height of one line of text. This greatly simplifies the layout of the graphics.

 ############################################# # User settings for the graphic. # ############################################# # Specify the window width and height and # pixels per inch. device_Width=12 device_Height=4.5 pixels_Per_Inch <- 100 # Specify the bin width (in hours) of the # upper histogram. bin_Width <- 2 # Specify a padding size for separating text # from other plot elements. padding <- unit(1, "strwidth", "W") # Specify the bin cut-off values for the hit # counts and the corresponding colors. The # cutoff should be the maximum value to be # contained in the bin. bin_Settings <- read.table(text=" cutoff color 10 'darkblue' 20 'deepskyblue' 40 'purple' 80 'magenta' 160 'red' ", header=TRUE, sep=" ") # Specify the size of the histogram plots # in 'grid' units. Override only if necessary. # histogram_Size <- unit(6, "lines") histogram_Size <- unit(nrow(bin_Settings) + 1, "lines") # Set the background color for distinguishing # between rows of data. row_Background <- "gray90" # Set the color for the date lines. date_Color <- "gray40" # Set the color for marker lines on histograms. marker_Color <- "gray80" # Set the fontsize for labels. label_Size <- 10

Step 3 It is time to do the graphics. I have limited explanation space in the SO answer, so I will summarize and then leave code comments to explain the details. In a nutshell, I calculate how big everything will be, and then do the charts one at a time. For each plot, I will first format my data, so I can correctly specify the viewport. Then I set the labels that should be behind the data, and then I draw the data. At the end I “pop” in the viewport to complete it.

  ############################################# # Make the graphic. # ############################################# # Make sure bin cutoffs are in increasing order. # This way, we can make assumptions later. bin_Settings <- bin_Settings[order(bin_Settings$cutoff),] # Initialize plot window. # Make sure you always specify the pixels per # inch, so you have an appropriately scaled # graphic for output. windows( width=device_Width, height=device_Height, xpinch=pixels_Per_Inch, ypinch=pixels_Per_Inch) grid.newpage() # Push an initial viewport, so we can set the # font size to use in calculating label widths. pushViewport(viewport(gp=gpar(fontsize=label_Size))) # Find the list of websites in the data. unique_Urls <- as.character(unique(df_Hits$website)) # Calculate the width of the website # urls once printed on the screen. label_Width <- list() for (i in 1:length(unique_Urls)){ label_Width[[i]] <- convertWidth(unit(1, "strwidth", unique_Urls[i]), "npc") } # Use the maximum url width plus two padding. x_Label_Margin <- unit(max(unlist(label_Width)), "npc") + padding * 2 # Calculate a height for the date labels plus two padding. y_Label_Margin <- unit(1, "strwidth", "99/99/9999") + padding * 2 # Calculate size of main plot after making # room for histogram and label margins. main_Width <- unit(1, "npc") - histogram_Size - x_Label_Margin main_Height <- unit(1, "npc") - histogram_Size - y_Label_Margin # Calculate x values, using the minimum datetime # as zero, and counting the hours between each # datetime and the minimum. x_Values <- as.integer((df_Hits$datetime - min(df_Hits$datetime)))/60^2 # Initialize main plotting area pushViewport(viewport( x=x_Label_Margin, y=y_Label_Margin, width=main_Width, height=main_Height, xscale=c(-1, max(x_Values) + 1), yscale=c(0, length(unique_Urls) + 1), just=c("left", "bottom"), gp=gpar(fontsize=label_Size))) # Put grey background behind every other website # to make data easier to read, and write urls as # y-labels. for (i in 1:length(unique_Urls)){ if (i%%2==0){ grid.rect( x=unit(-1, "npc"), y=i, width=unit(2, "npc"), height=1, default.units="native", just=c("left", "center"), gp=gpar(col=row_Background, fill=row_Background)) } grid.text( unique_Urls[i], x=unit(0, "npc") - padding, y=i, default.units="native", just=c("right", "center")) } # Find the hour offset of the minimum date value. time_Offset <- as.integer(format(min(df_Hits$datetime), "%H")) # Find the dates in the data. x_Labels <- unique(format(df_Hits$datetime, "%m/%d/%Y")) # Find where the days begin in the data. midnight_Locations <- (0:max(x_Values))[(0:max(x_Values)+time_Offset)%%24==0] # Write the appropriate date labels on the x-axis # where the days begin. grid.text( x_Labels, x=midnight_Locations, y=unit(0, "npc") - padding, default.units="native", just=c("right", "center"), rot=90) # Draw lines to vertically mark when days begin. grid.polyline( x=c(midnight_Locations, midnight_Locations), y=unit(c(rep(0, length(midnight_Locations)), rep(1, length(midnight_Locations))), "npc"), default.units="native", id=rep(midnight_Locations, 2), gp=gpar(lty=2, col=date_Color)) # Initialize bin assignment variable. bin_Assignment <- 1 # Calculate which bin each hit value belongs in. for (i in 1:nrow(bin_Settings)){ bin_Assignment <- bin_Assignment + ifelse(df_Hits$hits>bin_Settings$cutoff[i], 1, 0) } # Draw points, coloring according to the bin settings. grid.points( x=x_Values, y=match(df_Hits$website, unique_Urls), pch=19, size=unit(1, "native"), gp=gpar(col=as.character(bin_Settings$color[bin_Assignment]), alpha=0.5)) # Finalize the main plotting area. popViewport() # Create the bins for the upper histogram. bins <- ddply( data.frame(df_Hits, bin_Assignment, mid=floor(x_Values/bin_Width)*bin_Width+bin_Width/2), .(bin_Assignment, mid), summarize, freq=length(hits)) # Initialize upper histogram area pushViewport(viewport( x=x_Label_Margin, y=y_Label_Margin + main_Height, width=main_Width, height=histogram_Size, xscale=c(-1, max(x_Values) + 1), yscale=c(0, max(bins$freq) * 1.05), just=c("left", "bottom"), gp=gpar(fontsize=label_Size))) # Calculate where to put four value markers. marker_Interval <- floor(max(bins$freq)/4) digits <- nchar(marker_Interval) marker_Interval <- round(marker_Interval, -digits+1) # Draw horizontal lines to mark values. grid.polyline( x=unit(c(rep(0,4), rep(1,4)), "npc"), y=c(1:4 * marker_Interval, 1:4 * marker_Interval), default.units="native", id=rep(1:4, 2), gp=gpar(lty=2, col=marker_Color)) # Write value labels for each marker. grid.text( 1:4 * marker_Interval, x=unit(0, "npc") - padding, y=1:4 * marker_Interval, default.units="native", just=c("right", "center")) # Finalize upper histogram area, so we # can turn it back on but with clipping. popViewport() # Initialize upper histogram area again, # but with clipping turned on. pushViewport(viewport( x=x_Label_Margin, y=y_Label_Margin + main_Height, width=main_Width, height=histogram_Size, xscale=c(-1, max(x_Values) + 1), yscale=c(0, max(bins$freq) * 1.05), just=c("left", "bottom"), gp=gpar(fontsize=label_Size), clip="on")) # Draw bars for each bin. for (i in 1:nrow(bin_Settings)){ active_Bin <- bins[bins$bin_Assignment==i,] if (nrow(active_Bin)>0){ for (j in 1:nrow(active_Bin)){ grid.rect( x=active_Bin$mid[j], y=0, width=bin_Width, height=active_Bin$freq[j], default.units="native", just=c("center","bottom"), gp=gpar(col=as.character(bin_Settings$color[i]), fill=as.character(bin_Settings$color[i]), alpha=1/nrow(bin_Settings))) } } } # Draw x-axis. grid.lines(x=unit(c(0, 1), "npc"), y=0, default.units="native") # Finalize upper histogram area. popViewport() # Calculate the frequencies for each website and bin. freq_Data <- ddply( data.frame(df_Hits, bin_Assignment), .(website, bin_Assignment), summarize, freq=length(hits)) # Create the line data for the side histogram. line_Data <- matrix(0, nrow=length(unique_Urls)+2, ncol=nrow(bin_Settings)) for (i in 1:nrow(freq_Data)){ line_Data[match(freq_Data$website[i], unique_Urls)+1,freq_Data$bin_Assignment[i]] <- freq_Data$freq[i] } # Initialize side histogram area pushViewport(viewport( x=x_Label_Margin + main_Width, y=y_Label_Margin, width=histogram_Size, height=main_Height, xscale=c(0, max(line_Data) * 1.05), yscale=c(0, length(unique_Urls) + 1), just=c("left", "bottom"), gp=gpar(fontsize=label_Size))) # Calculate where to put four value markers. marker_Interval <- floor(max(line_Data)/4) digits <- nchar(marker_Interval) marker_Interval <- round(marker_Interval, -digits+1) # Draw vertical lines to mark values. grid.polyline( x=c(1:4 * marker_Interval, 1:4 * marker_Interval), y=unit(c(rep(0,4), rep(1,4)), "npc"), default.units="native", id=rep(1:4, 2), gp=gpar(lty=2, col=marker_Color)) # Write value labels for each marker. grid.text( 1:4 * marker_Interval, x=1:4 * marker_Interval, y=unit(0, "npc") - padding, default.units="native", just=c("center", "top")) # Draw lines for each bin setting. grid.polyline( x=array(line_Data), y=rep(0:(length(unique_Urls)+1), nrow(bin_Settings)), default.units="native", id=array(t(matrix(1:nrow(bin_Settings), nrow=nrow(bin_Settings), ncol=length(unique_Urls)+2))), gp=gpar(col=as.character(bin_Settings$color))) # Draw vertical line for the y-axis. grid.lines(x=0, y=c(0, length(unique_Urls)+1), default.units="native") # Finalize side histogram area. popViewport() # Draw legend. # Draw box behind legend headers. grid.rect( x=0, y=1, width=unit(1, "strwidth", names(bin_Settings)[1]) + unit(1, "strwidth", names(bin_Settings)[2]) + 3 * padding, height=unit(1, "lines"), default.units="npc", just=c("left","top"), gp=gpar(col=row_Background, fill=row_Background)) # Draw legend headers from bin_Settings variable. grid.text( names(bin_Settings)[1], x=padding, y=1, default.units="npc", just=c("left","top")) grid.text( names(bin_Settings)[2], x=unit(1, "strwidth", names(bin_Settings)[1]) + 2 * padding, y=1, default.units="npc", just=c("left","top")) # For each row in the bin_Settings variable, # write the cutoff values and the color associated. # Write the color name in the color it specifies. for (i in 1:nrow(bin_Settings)){ grid.text( bin_Settings$cutoff[i], x=unit(1, "strwidth", names(bin_Settings)[1]) + padding, y=unit(1, "npc") - i * unit(1, "lines"), default.units="npc", just=c("right","top")) grid.text( bin_Settings$color[i], x=unit(1, "strwidth", names(bin_Settings)[1]) + 2 * padding, y=unit(1, "npc") - i * unit(1, "lines"), default.units="npc", just=c("left","top"), gp=gpar(col=as.character(bin_Settings$color[i]))) }

Save yaxis legends as a separate coffin?

More articles: