What is the width argument in position_dodge?

The documentation does not explain what exactly is this argument width

  • Whose width indicates?
  • What is a unit?
  • What is the default value?

The default value is width = NULL , but the trial version and the error show that width = 0.9 seems to create a default effect (see postscript). However, I could not find where such a default value was set in the ggplot2 source code . In this way,

  1. Could you explain how the default junk is implemented in ggplot2 code?

The spirit of the question is to allow ggplot2 users ggplot2 find the appropriate width values ​​without trial and error. PS:

 ggplot(data = df) + geom_bar(aes(x, y, fill = factor(group)), position = position_dodge(), stat = "identity") ggplot(data = df) + geom_bar(aes(x, y, fill = factor(group)), position = position_dodge(0.9), stat = "identity") 
+23
r ggplot2
Jan 20 '16 at 1:04 on
source share
1 answer

First, I will give short answers to three basic questions. Then I look at a few examples to illustrate the answers in more detail.

  • Whose width is indicated?
    The width of the items to be dodged.

  • What is a unit?
    The actual or virtual width in units of data of the items to be dodged.

  • What is the default value? If you do not set the width evasion explicitly, but rely on the default value, position_dodge(width = NULL) (or just position = "dodge" ), the deviation width used is the actual width in units of data of the element that will evade.

I find your fourth question is too broad for SO. Please refer to the collide and dodge code and, if necessary, ask a new, more specific question.




Depending on the width of the slope of the element (together with its initial horizontal position and the number of elements that are stacked), the new central positions ( x ) of each element and the new widths ( xmin , xmax ). Elements move horizontally far enough so as not to overlap with neighboring elements. Obviously, wide elements must be shifted more than narrow elements to avoid overlapping.

To better understand the evasion in general and the use of the width argument in particular, I will give a few examples. We start with a simple dodging line chart with default dodging; we can use either position = "dodge" or a more explicit position = position_dodge(width = NULL)

 # some toy data df <- data.frame(x = 1, y = 1, grp = c("A", "B")) p <- ggplot(data = df, aes(x = x, y = y, fill = grp)) + theme_minimal() p + geom_bar(stat = "identity", position = "dodge") # which is the same as: # position = position_dodge(width = NULL)) 

enter image description here

So, (1) who width is in position_dodge and (2) what is a unit?

In ?position_dodge we can read:

width : the width of the deviation when it differs from the width of individual elements

Thus, if we use the default width , i.e. NULL , evasion quotes are based on the width of individual elements.

So, the trivial answer to your first question is: “Whose width does it determine?” will be: width of individual elements.

But, of course, we then wonder: what is the "width of individual elements"? Start with the bars. From ?geom_bar :

width : the width of the bar. The default is 90% data resolution.

A new question arises: what is permission? Let the check ?ggplot2::resolution :

Resolution is the smallest nonzero distance between adjacent values. If there is only one unique value [as in our example], then the resolution is defined as a whole.

We are trying:

 resolution(df$x) # [1] 1 

Thus, the default width in this example is 0.9 * 1 = 0.9

We can verify this by looking at the ggplot data to display the bars on the chart using ggplot_build . We create a plot object with a stacked line font, with default stripes.

 p2 <- p + geom_bar(stat = "identity", position = "stack") 

The corresponding slot in the $data object, which is a list with one element for each layer on the chart, in the same order in which they are displayed in the code. In this example, we have only one layer, i.e. geom_bar , so let's look at the first slot:

 ggplot_build(p2)$data[[1]] # fill xy label PANEL group ymin ymax xmin xmax colour size linetype alpha # 1 #F8766D 1 1 A 1 1 0 1 0.55 1.45 NA 0.5 1 NA # 2 #00BFC4 1 2 B 1 2 1 2 0.55 1.45 NA 0.5 1 NA 

Each line contains data for "drawing" one line. As you can see, the width of the bars is 0.9 ( xmax - xmin = 0.9 ). Thus, the width of the stacked rods, which will be used in the calculation of new deviated positions and widths, is 0.9 .




In the previous example, we used the default bandwidth along with the default deviation width. Now let's make the panel a little wider than the default width above (0.9). Use the width argument in geom_bar to explicitly set the width of the strip (laid out), for example 1. We are trying to use the same slope width as above ( position_dodge(width = 0.9) ). Thus, although we have set the actual bandwidth to 1, the evasion calculations are performed as if the bars had a width of 0.9. Let's see what happens:

 p + geom_bar(stat = "identity", width = 1, position = position_dodge(width = 0.9), alpha = 0.8) p 

enter image description here

The bands overlap because ggplot shifts the stripes horizontally, as if they had a (stacked) width of 0.9 (set in position_dodge ), but in fact the stripes have a width of 1 (set in geom_bar ).

If we use the default deviation values, the stripes are shifted horizontally exactly according to the dial bandwidth:

 p + geom_bar(stat = "identity", width = 1, position = "dodge", alpha = 0.8) # or: position = position_dodge(width = NULL) 



Then we will try to add text to our plot using geom_text . We start by evading the default width (i.e. position_dodge(width = NULL) ), i.e. Evasion is based on the default element size.

 p <- ggplot(data = df, aes(x = x, y = y, fill = grp, label = grp)) + theme_minimal() p2 <- p + geom_bar(stat = "identity", position = position_dodge(width = NULL)) + geom_text(size = 10, position = position_dodge(width = NULL)) # or position = "dodge" p2 # Warning message: # Width not defined. Set with `position_dodge(width = ?)` 

enter image description here

Avoiding text does not work. How about a warning? "Width not defined?". A little mysterious. Do we need to consult the "Details" section ?geom_text :

Please note that the "width" and "height" of the text element are 0, so styling and text avoidance will not work by default, [...] Obviously, labels have height and width, but they are physical units, not units data.

So, for geom_text width of the individual elements is zero. This is also the first “official ggplot link” to your second question: The unit of width is in data units .

Look at the data used to render the text elements on the chart:

 ggplot_build(p3)$data[[2]] # fill xy label PANEL group xmin xmax ymax colour size angle hjust vjust alpha family fontface lineheight # 1 #F8766D 1 1 A 1 1 1 1 1 black 10 0 0.5 0.5 NA 1 1.2 # 2 #00BFC4 1 1 B 1 2 1 1 1 black 10 0 0.5 0.5 NA 1 1.2 

Indeed, xmin == xmax ; Thus, the width of the text element in data units is zero.

How to achieve the correct avoidance of a text element with a zero width? From the examples in ?geom_text :

ggplot2 does not know that you want to give the labels the same virtual width as the bars [...] So say:

So that dodge uses the same width for geom_text elements as for geom_bar elements when calculating new positions, we need to set the "virtual deviation width in data units" of the text element of the same width as the bars. We use the argument width position_dodge to set the virtual width of the text element to 0.9 (i.e. the Width of the strip in the example above):

 p2 <- p + geom_bar(stat = "identity", position = position_dodge(width = NULL)) + geom_text(position = position_dodge(width = 0.9), size = 10) 

Check the data used to render geom_text :

 ggplot_build(p2)$data[[2]] # fill xy label PANEL group xmin xmax ymax colour size angle hjust vjust alpha family fontface lineheight # 1 #F8766D 0.775 1 A 1 1 0.55 1.00 1 black 10 0 0.5 0.5 NA 1 1.2 # 2 #00BFC4 1.225 1 B 1 2 1.00 1.45 1 black 10 0 0.5 0.5 NA 1 1.2 

Now text elements have a width in data units: xmax - xmin = 0.9 , that is, the same width as the columns. Thus, the evasion calculations will now be done as if the text elements had a certain width, here 0.9. Extract the plot:

 p2 

enter image description here

The text evades correctly!




Like text, the width in point data units ( geom_point ) and error bars (e.g. geom_errorbar ) is zero. Thus, if you need to evade such elements, you need to specify the appropriate virtual width on which evasion calculations are based. See Example Example ?geom_errorbar :

If you want to avoid bars and errors, you need to manually specify the width of the dodge [...] Since bars and error frames have different widths, we need to indicate how wide the objects that we evade are,




Here is an example with multiple x values ​​on a continuous scale:

 df <- data.frame(x = rep(c(10, 20, 50), each = 2), y = 1, grp = c("A", "B")) 

Let's say we want to create an evasive barpley with some text above each bar. First, just check the barcode only using the default deviation width:

 p <- ggplot(data = df, aes(x = x, y = y, fill = grp, label = grp)) + theme_minimal() p + geom_bar(stat = "identity", position = position_dodge(width = NULL)) # or position = "dodge" 

It works as expected. Then add the text. We are trying to set the virtual width of the text element in the same way as the width of the columns in the above example, that is, we “guess” that the bars still have a width of 0.9 and that we need to evade the text elements as if they were also have a width of 0.9:

 p + geom_bar(stat = "identity", position = "dodge") + geom_text(position = position_dodge(width = 0.9), size = 10) 

enter image description here

Obviously, the calculation of evasion for bars is now based on a different width than 0.9, and setting the virtual width to 0.9 for a text element was a bad guess. So what is the width of a bar? Again, the bar width is "[b] y by default, set to 90% of the data resolution." Check resolution:

 resolution(df$x) # [1] 10 

Thus, the width of the (default stacked) bars, on which their new, deviated position is calculated, is now 0.9 * 10 = 9 . Thus, in order to evade the bars and their corresponding text “hand in hand”, we need to set the virtual width of the text elements to 9 as well:

 p + geom_bar(stat = "identity", position = "dodge") + geom_text(position = position_dodge(width = 9), size = 10) 

enter image description here




In our last example, we have the categorical x axis, just the “factor version” of the x values ​​from above.

 df <- data.frame(x = factor(rep(c(10, 20, 50), each = 2)), y = 1, grp = c("A", "B")) 

In R, factors are internally a set of integer codes with the attribute "levels". And from ?resolution :

If x is an integer vector, then it is assumed that it is a discrete variable, and the resolution is 1.

Currently, we know that when resolution is 1, the default column width is 0.9. Thus, on the categorical x-axis, the default width for geom_bar is 0.9, and we need to set the width deviation for geom_text respectively:

 ggplot(data = df, aes(x = x, y = y, fill = grp, label = grp)) + theme_minimal() + geom_bar(stat = "identity", position = "dodge") + # or: position = position_dodge(width = NULL) # or: position = position_dodge(width = 0.9) geom_text(position = position_dodge(width = 0.9), size = 10) 

enter image description here

+38
Jan 30 '16 at 13:40
source share



All Articles