In tidyr, what criteria does the `gather` function use to match a wide to long DataFrame?

Question

In tidyr, what criteria does the `gather` function use to match a wide to long DataFrame?

I am trying to figure out the arguments for gather in the tidyr package.

I looked at the documentation and the syntax looks like this:

gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)

The help files have:

 stocks <- data.frame( time = as.Date('2009-01-01') + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) gather(stocks, stock, price, -time)

I am interested in the last line:
gather(stocks, stock, price, -time)

Here the stocks are obviously the data we want to change, and that's fine.

So, I can read that stock and price are arguments to a key value pair, but how does this function determine how to select the columns to create this key value pair? The original framework is as follows:

 time XYZ 2009-01-01 1.10177950 -1.1926213 -7.4149618 2009-01-02 0.75578151 -4.3705737 -0.3117843 2009-01-03 -0.23823356 -1.3497319 3.8742654 2009-01-04 0.98744470 -4.2381224 0.7397038 2009-01-05 0.74139013 -2.5303960 -5.5197743

I see no indication that we should use any combination of X , Y or Z When I use this function, it seems to me that I just choose the names for what I want the columns in my long formatted data frame to be, and I pray that gather magically. Think about it, I feel the same when I use melt .

Does gather mean column type? How is it displayed wide to long?

EDIT The big answer below, the big discussion below, and for those who want to learn more about the philosophy and use of the tidyr package, be sure to read this paper , although the vignette does not explain the syntax.

+5

r dataframe tidyr reshape2

Matt O'Brien Jan 25 '15 at 5:46

source share

1 answer

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2015-01-25T06:07:20+0000

In "tidyr" you specify the dimension variables for gather in the argument ... This is slightly different conceptually from melt , where many examples (even many answers here on SO) show the use of the id.vars argument (assuming that everything that is not specified as an identifier is a dimension).

The argument ... can also take a column name - as shown in the example you specified. It basically says "collect all columns except this one." Another shorthand approach in gather involves specifying a range of columns with a colon, such as gather(stocks, stock, price, X:Z) .

You can compare gather with melt by looking at the function code. Here are the first few lines:

 > tidyr:::gather_.data.frame function (data, key_col, value_col, gather_cols, na.rm = FALSE, convert = FALSE) { data2 <- reshape2::melt(data, measure.vars = gather_cols, variable.name = key_col, value.name = value_col, na.rm = na.rm)

In tidyr, what criteria does the `gather` function use to match a wide to long DataFrame?

More articles: