I am trying to figure out the arguments for gather in the tidyr package.
I looked at the documentation and the syntax looks like this:
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
The help files have:
stocks <- data.frame( time = as.Date('2009-01-01') + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) gather(stocks, stock, price, -time)
I am interested in the last line:
gather(stocks, stock, price, -time)
Here the stocks are obviously the data we want to change, and that's fine.
So, I can read that stock and price are arguments to a key value pair, but how does this function determine how to select the columns to create this key value pair? The original framework is as follows:
time XYZ 2009-01-01 1.10177950 -1.1926213 -7.4149618 2009-01-02 0.75578151 -4.3705737 -0.3117843 2009-01-03 -0.23823356 -1.3497319 3.8742654 2009-01-04 0.98744470 -4.2381224 0.7397038 2009-01-05 0.74139013 -2.5303960 -5.5197743
I see no indication that we should use any combination of X , Y or Z When I use this function, it seems to me that I just choose the names for what I want the columns in my long formatted data frame to be, and I pray that gather magically. Think about it, I feel the same when I use melt .
Does gather mean column type? How is it displayed wide to long?
EDIT The big answer below, the big discussion below, and for those who want to learn more about the philosophy and use of the tidyr package, be sure to read this paper , although the vignette does not explain the syntax.