Formatting ggplot2 axis labels with commas (and K? MM?), If I already have a y-scale

I am trying to format data on cost and income (both in thousands) and Impressions data (in millions) for the y-axis graphs of the ggplot graph.

My site runs from 31 days ago to yesterday and uses the min and max values ​​for this period for the ylim(c(min,max)) parameter ylim(c(min,max)) . Showing only an example of value,

 library(ggplot2) library(TTR) set.seed(1984) #make series start <- as.Date('2016-01-01') end <- Sys.Date() days <- as.numeric(end - start) #make cost and moving averages cost <- rnorm(days, mean = 45400, sd = 11640) date <- seq.Date(from = start, to = end - 1, by = 'day') cost_7 <- SMA(cost, 7) cost_30 <- SMA(cost, 30) df <- data.frame(Date = date, Cost = cost, Cost_7 = cost_7, Cost_30 = cost_30) # set parameters for window left <- end - 31 right <- end - 1 # plot series ggplot(df, aes(x = Date, y = Cost))+ geom_line(lwd = 0.5) + geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) + geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) + xlim(c(left, right)) + ylim(c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))) + xlab("") 

ggplot output

I would like a) to want to represent thousands and millions along the y axis with commas and b) as abbreviated numbers, but with "K" for thousands or "MM" for millions. I understand that b) can be high order, but at the moment a) cannot be achieved with

ggplot(...) + ... + ylim(c(min, max)) + scale_y_continuous(labels = comma)

Since the following error occurs:

 ## Scale for 'y' is already present. Adding another scale for 'y', which ## will replace the existing scale. 

I tried to put the scale_y_continuous(labels = comma) section after the geom_line() layer (which throws an error above) or at the end of all ggplot layers, which overrides my limitations in the ylim call, and then throws an error above, anyway.

Any ideas?

+5
source share
2 answers

To format the comma, you need to enable the scales library for label=comma . The “error” you were discussing is actually just a warning because you used both ylim and scale_y_continuous . The second call cancels the first. Instead, you can set restrictions and specify labels separated by commas in one call to scale_y_continuous :

 library(scales) ggplot(df, aes(x = Date, y = Cost))+ geom_line(lwd = 0.5) + geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) + geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) + xlim(c(left, right)) + xlab("") + scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))) 

Another option would be to melt your data into a long format before the graphics, which will reduce the amount of code needed and simplify aesthetic comparisons:

 library(reshape2) ggplot(melt(df, id.var="Date"), aes(x = Date, y = value, color=variable, linetype=variable))+ geom_line() + xlim(c(left, right)) + labs(x="", y="Cost") + scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))) 

In any case, to put y values ​​in thousands or millions, you can divide y values ​​by 1,000 or 1,000,000. I used dollar_format() below, but I think you will also need to divide by 10 if you use unit_format (per @joran suggestion). For instance:

 div=1000 ggplot(melt(df, id.var="Date"), aes(x = Date, y = value/div, color=variable, linetype=variable))+ geom_line() + xlim(c(left, right)) + labs(x="", y="Cost (Thousands)") + scale_y_continuous(label=dollar_format(), limits=c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))/div) 

Use scale_color_manual and scale_linetype_manual to set custom colors and line types, if necessary.

enter image description here

+9
source

Here, a solution may be possible for part b).

This blog post proposes a solution in the form of a function.

 format_si <- function(...) { limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12, 1e-9, 1e-6, 1e-3, 1e0, 1e3, 1e6, 1e9, 1e12, 1e15, 1e18, 1e21, 1e24) prefix <- c("y", "z", "a", "f", "p", "n", "µ", "m", " ", "k", "M", "G", "T", "P", "E", "Z", "Y") # Vector with array indices according to position in intervals i <- findInterval(abs(x), limits) # Set prefix to " " for very small values < 1e-24 i <- ifelse(i==0, which(limits == 1e0), i) paste(format(round(x/limits[i], 1), trim=TRUE, scientific=FALSE, ...), prefix[i]) } } return(paste(format(round(x,1), trim=TRUE, scientific=FALSE, ...), p)) } ggplot(df, aes(x = Date, y = Cost))+ geom_line(lwd = 0.5) + geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) + geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) + xlim(c(left, right)) + xlab("") + scale_y_continuous(label=format_si(), limits=c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))) 

Needless to say, prefix can be adapted as you like. Here's what the result looks like (dates in French when R is set to FR on my computer).

0
source

All Articles