How to work with ggplot2 and overlap labels on a discrete axis

Question

How to work with ggplot2 and overlap labels on a discrete axis

ggplot2 does not seem to have a built-in way to work with the add-in for text on scatterplots . However, I have a different situation where the labels are the ones on the discrete axis, and I wonder if anyone has a better solution than what I did.

Code example:

library(ggplot2) #some example data test.data = data.frame(text = c("A full commitment what I'm thinking of", "History quickly crashing through your veins", "And I take A deep breath and I get real high", "And again, the Internet is not something that you just dump something on. It not a big truck."), mean = c(3.5, 3, 5, 4), CI.lower = c(4, 3.5, 5.5, 4.5), CI.upper = c(3, 2.5, 4.5, 3.5)) #plot ggplot(test.data, aes_string(x = "text", y = "mean")) + geom_point(stat="identity") + geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) + scale_x_discrete(labels = test.data$text, name = "")

enter image description here

So, we see that the labels of the x axis are located one above the other. Two spring solutions: 1) abbreviation for labels and 2) adding new lines to labels. In many cases (1), but in some cases this is not possible. So I wrote a function to add new lines ( \n ) for every nth character for lines, to avoid matching names:

 library(ggplot2) #Inserts newlines into strings every N interval new_lines_adder = function(test.string, interval){ #length of str string.length = nchar(test.string) #split by N char intervals split.starts = seq(1,string.length,interval) split.ends = c(split.starts[-1]-1,nchar(test.string)) #split it test.string = substring(test.string, split.starts, split.ends) #put it back together with newlines test.string = paste0(test.string,collapse = "\n") return(test.string) } #a user-level wrapper that also works on character vectors, data.frames, matrices and factors add_newlines = function(x, interval) { if (class(x) == "data.frame" | class(x) == "matrix" | class(x) == "factor") { x = as.vector(x) } if (length(x) == 1) { return(new_lines_adder(x, interval)) } else { t = sapply(x, FUN = new_lines_adder, interval = interval) #apply splitter to each names(t) = NULL #remove names return(t) } } #plot again ggplot(test.data, aes_string(x = "text", y = "mean")) + geom_point(stat="identity") + geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) + scale_x_discrete(labels = add_newlines(test.data$text, 20), name = "")

And the result: enter image description here

Then you can spend some time at intervals to avoid too much space between the marks.

If the number of labels changes, this solution is not so good, because the optimal interval size changes. In addition, since the regular font is not a monolayer, the label text also affects the width, and therefore you need to carefully monitor the choice of a good spacing (this can be avoided by using a monospatial font, but they are very wide). Finally, the new_lines_adder() function is stupid in that it will split the words into two in stupid ways that people would not. For example. in the above, he broke "breath" into "br \ nreath". One could rewrite it to avoid this problem.

You can also reduce the font size, but this is a compromise with readability and often reducing the font size is not required.

What is the best way to handle this type of shortcut add-ons?

+8

r plot ggplot2 axis-labels

Deleet Jun 2 '15 at 2:04

source share

2 answers

I tried to build another version of new_lines_adder :

 new_lines_adder = function(test.string, interval) { #split at spaces string.split = strsplit(test.string," ")[[1]] # get length of snippets, add one for space lens <- nchar(string.split) + 1 # now the trick: split the text into lines with # length of at most interval + 1 (including the spaces) lines <- cumsum(lens) %/% (interval + 1) # construct the lines test.lines <- tapply(string.split,lines,function(line) paste0(paste(line,collapse=" "),"\n"),simplify = TRUE) # put everything into a single string result <- paste(test.lines,collapse="") return(result) }

It breaks lines only into spaces and ensures that lines contain no more than the number of characters specified by interval . In this case, your plot is as follows:

I would not argue that this is the best way. It still ignores that not all characters are the same width. Perhaps something better could be achieved with strwidth .

By the way: you can simplify add_newlines as follows:

 add_newlines = function(x, interval) { # make sure, x is a character array x = as.character(x) # apply splitter to each t = sapply(x, FUN = new_lines_adder, interval = interval,USE.NAMES=FALSE) return(t) }

At the beginning of as.character make sure you have a character string. This also does not stop you from doing this if you already have a character string, so there is no need for an if clause.

Also, the following if clause if not required: sapply works fine if x contains only one element. And you can suppress the names by setting USE.NAMES=FALSE , so you do not need to delete the names in the extra line.

+4

Stibu Jun 2 '15 at 19:17

source share

Deleet · Accepted Answer · 2015-06-05T08:46:42+0000

Based on @Stibu's answer and comments, this solution takes into account the number of groups and uses the intelligent cleavage developed by Stibu, adding a correction for words separated by a slash.

Functions:

 #Inserts newlines into strings every N interval new_lines_adder = function(x, interval) { #add spaces after / x = str_replace_all(x, "/", "/ ") #split at spaces x.split = strsplit(x, " ")[[1]] # get length of snippets, add one for space lens <- nchar(x.split) + 1 # now the trick: split the text into lines with # length of at most interval + 1 (including the spaces) lines <- cumsum(lens) %/% (interval + 1) # construct the lines x.lines <- tapply(x.split, lines, function(line) paste0(paste(line, collapse=" "), "\n"), simplify = TRUE) # put everything into a single string result <- paste(x.lines, collapse="") #remove spaces we added after / result = str_replace_all(result, "/ ", "/") return(result) } #wrapper for the above, meant for users add_newlines = function(x, total.length = 85) { # make sure, x is a character array x = as.character(x) #determine number of groups groups = length(x) # apply splitter to each t = sapply(x, FUN = new_lines_adder, interval = round(total.length/groups), USE.NAMES=FALSE) return(t) }

I tried some default input values, and 85 tried a value for which the text result is suitable for the example data. Any higher and "veins" in mark 2 move up and approach the third mark.

Here's what it looks like:

However, it would be better to use a real measure of the full width of the text, rather than the number of characters, since relying on this proxy usually means that labels spend a lot of space. Perhaps one could rewrite new_lines_adder() with some strwidth based strwidth to solve the problem of unequal character widths.

I leave this question unanswered if someone finds a way to do this.

I added two functions to my personal github package , so anyone who wants to use them can get them from there.

How to work with ggplot2 and overlap labels on a discrete axis

More articles: