Draw lines between different elements in a multi-line chart

I am trying to draw lines between two separate stacks (the same graph) in ggplot2 to show that two segments of the second column are a subset of the first bar.

I tried both geom_line and geom_segment . However, I ran into the same problem with assigning one start and stop for each geom (requires two lines) in the same graph as the five-line framework.

Sample chart code without lines:

 library(data.table) Example <- data.table(X_Axis = c('Count', 'Count', 'Dollars', 'Dollars', 'Dollars'), Stack_Group = c('Purely A', 'A & B', 'Purely A Dollars', 'B Mixed Dollars', 'A Mixed dollars'), Value = c(10,3, 120000, 100000, 50000)) Example[, Percent := Value/sum(Value), by = X_Axis] ggplot(Example, aes(x = X_Axis, y = Percent, fill = factor(Stack_Group))) + geom_bar(stat = 'identity', width = 0.5) + scale_y_continuous(labels = scales::percent) 

The purpose of the final section: enter image description here

+7
r ggplot2
source share
3 answers

Instead of hard coding the start and end position of the segments, you can capture this data from the plot object. Here's an alternative in which you specify the names of x elements and bar elements between which lines should be drawn.

Assign the graph to a variable:

 p <- ggplot() + geom_bar(data = Example, aes(x = X_Axis, y = Percent, fill = Stack_Group), stat = 'identity', width = 0.5) 

Take data from the plot object ( ggplot_build ). Convert to data.table ( setDT ):

 d <- ggplot_build(p)$data[[1]] setDT(d) 

In the data of the plot object, the variables 'x' and 'group' are not indicated explicitly by their name, but as numbers. Since categorical variables are ordered lexicographically in ggplot , we can match numbers with their names by their rank inside each x:

 d[ , r := rank(group), by = x] Example[ , x := .GRP, by = X_Axis] Example[ , r := rank(Stack_Group), by = x] 

Join to add the names "X_Axis" and "Stack_Group" from the source data to build the data:

 d <- d[Example[ , .(X_Axis, Stack_Group, x, r)], on = .(x, r)] 

Specify the names of the categories x and bar elements between which the lines should be drawn:

 x_start_nm <- "Count" x_end_nm <- "Dollars" e_start <- "A & B" e_upper <- "A Mixed dollars" e_lower <- "B Mixed Dollars" 

Select the appropriate parts of the plot object to create the start / end line data:

 d2 <- data.table(x_start = rep(d[X_Axis == x_start_nm & Stack_Group == e_start, xmax], 2), y_start = d[X_Axis == x_start_nm & Stack_Group == e_start, c(ymax, ymin)], x_end = rep(d[X_Axis == x_end_nm & Stack_Group == e_upper, xmin], 2), y_end = c(d[X_Axis == x_end_nm & Stack_Group == e_upper, ymax], d[X_Axis == x_end_nm & Stack_Group == e_lower, ymin])) 

Add line segments to the original chart:

 p + geom_segment(data = d2, aes(x = x_start, xend = x_end, y = y_start, yend = y_end)) 

enter image description here

+6
source share

You can do it:

 library(data.table) library(ggplot2) Example <- data.table(X_Axis = c('Count', 'Count', 'Dollars', 'Dollars', 'Dollars'), Stack_Group = c('Purely A', 'A & B', 'Purely A Dollars', 'B Mixed Dollars', 'A Mixed dollars'), Value = c(10,3, 120000, 100000, 50000)) Example[, Percent := Value/sum(Value), by = X_Axis] ggplot(Example) + geom_segment(data=data.frame(x=c("Count","Count"), xend=c("Dollars","Dollars"), y=c(1,0.94), yend=c(1,0.27)),aes(x=x,y=y,xend=xend,yend=yend))+ geom_bar(aes(x = X_Axis, y = Percent, fill=factor(Stack_Group)),stat='identity', width = .5) + scale_y_continuous(labels = scales::percent) 

What gives:
enter image description here

NB: Since the X axis is categorical, we are faced with the problem that the segment starts from this point, and not from the border of the bars themselves. This is why I draw geom_segment and then geom_bar so that the last is first.
Here the values ​​were set manually, however, using trigonometry and width, you can calculate the offset value needed for the desired view.

+2
source share

Here is another flexible and simple approach that somewhat resembles @Henrik's answer, but works exclusively with user data. There is no need to retrieve data from the ggplot_build() object.

Data preparation

the code:

 library(data.table) library(forcats) Example <- data.table( X_Axis = fct_inorder(c("Count", "Count", "Dollars", "Dollars", "Dollars")), Stack_Group = fct_rev(fct_inorder(c("Purely A", "A & B", "Purely A Dollars", "B Mixed Dollars", "A Mixed dollars"))), Value = c(10, 3, 120000, 100000, 50000), Grp2 = fct_inorder(c("Purely", "Mixed", "Purely", "Mixed", "Mixed")) ) Example[, Percent := Value/sum(Value), by = X_Axis] Example[order(Grp2, -Stack_Group), Cumulated := cumsum(Percent), by = X_Axis] 

Prepared data:

 Example # X_Axis Stack_Group Value Grp2 Percent Cumulated #1: Count Purely A 10 Purely 0.7692308 0.7692308 #2: Count A & B 3 Mixed 0.2307692 1.0000000 #3: Dollars Purely A Dollars 120000 Purely 0.4444444 0.4444444 #4: Dollars B Mixed Dollars 100000 Mixed 0.3703704 0.8148148 #5: Dollars A Mixed dollars 50000 Mixed 0.1851852 1.0000000 

Drawing

the code:

 library(ggplot2) w = 0.4 # width of bars ggplot(Example, aes(x = X_Axis, y = Percent, fill = Stack_Group)) + geom_col(width = w) + geom_line(aes(x = (1 - w) * as.numeric(X_Axis) + 1.5 * w, y = Top, group = Grp2), data = Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)], inherit.aes = FALSE) + scale_y_continuous(labels = scales::percent) 

Diagram:

enter image description here

Description

  • ggplot implies the forced conversion of character variables to factor , which controls the order in which elements are constructed. By default, the order of the levels in the coefficient is alphabetical. But here we need to clearly control the plot order. Therefore, we create factors with a certain order of levels using the convenient Hadley forcats .

  • The order of levels in Stack_Group is reversed to be in order with the order of ggplot2 (version 2.2.0+) - these are the stacking values ​​(see ?position_stack ).

  • Data includes two types of groups:

    • One is on X_Axis , different from "Count" and "Dollars" .
    • The other is hidden in Stack_Group , data element names and the way the OP wants to draw line segments. Here we explicitly define a new variable, Grp2 , which distinguishes between "Purely" at the bottom of each column and "Mixed" at the top of each column. This avoids hard coding of the start and end points of line segments, which makes this solution more flexible.
  • Cumulative percentages are calculated for each bar. They are needed later for drawing line segments.

  • The width of the strip is determined in the variable w and passed to the width parameter geom_col() .

  • Introduced with version 2.2.0 of ggplot2 , geom_col() is a shortcut for geom_bar(stat = "identity") .

  • Since only two columns are used to draw line segments between them, geom_lines() .

    • On the x axis, line segments range from x = 1 + w / 2 to x = 2 - w / 2. Here we use the fact that ggplot uses integers of factor levels to sketch. Thus, "Count" mapped to x = 1 and "Dollar" to x = 2. (This is why factor levels were explicitly defined).
    • The y values ​​for each column are taken from the maximum Top values ​​of the cumulative percentages in each Grp2 , which are calculated using Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)] . This allows you to change the names and order of data items in each Grp2 .
    • The parameter inherit.aes = FALSE needed to prevent ggplot from waiting for fill aesthetics.

Gain

If necessary, Grp2 can be easily visualized using a different type of line:

 w = 0.2 # width of bars ggplot(Example, aes(x = X_Axis, y = Percent, fill = Stack_Group)) + geom_col(width = w) + geom_line(aes(x = (1 - w) * as.numeric(X_Axis) + 1.5 * w, y = Top, group = Grp2, linetype = fct_rev(Grp2)), data = Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)], inherit.aes = FALSE) + scale_y_continuous(labels = scales::percent) + labs(linetype = "Purely vs Mixed") 

enter image description here

The legend now displays Grp 2 factors. The name in the legend was conveniently renamed using labs() . The order of the factors in Grp2 been reversed to have a 100% solid line and show the factors in the legend as they are laid out in a diagram ( "Purely" below, "Mixed" above).

Please note that also the width parameter w been changed for demonstration purposes.

+2
source share

All Articles