R: how to do more complex calculations from the comb of a dataset?

Question

R: how to do more complex calculations from the comb of a dataset?

Now I have a comb from the built-in diaphragm set. So far I have been guided by the fact that I was able to find the coefficient lm () of a pair of values.

myPairs <- combn(names(iris[1:4]), 2) formula <- apply(myPairs, MARGIN=2, FUN=paste, collapse="~") model <- lapply(formula, function(x) lm(formula=x, data=iris)$coefficients[2]) model

However, I would like to take a few more steps and use the coefficient from lm () for use in further calculations. I would like to do something like this:

 Coefficient <- lm(formula=x, data=iris)$coefficients[2] Spread <- myPairs[1] - coefficient*myPairs[2] library(tseries) adf.test(Spread)

The procedure itself is quite simple, but I could not find a way to do this for each comb in the data set. (As an alert, adf.test will not apply to such data, but I just use the iris dataset for demonstration). I wonder if it would be better to write a loop for such a procedure?

+6

loops r

Luke zhang Jun 15 '16 at 17:09

source share

3 answers

It looks like you will want to write your own function and call it in your myPairs loop (apply):

 yourfun <- function(pair){ fm <- paste(pair, collapse='~') coef <- lm(formula=fm, data=iris)$coefficients[2] Spread <- iris[,pair[1]] - coef*iris[,pair[2]] return(Spread) }

Then you can call this function:

 model <- apply(myPairs, 2, yourfun)

I think this is the cleanest way. But I don’t know what exactly you want to do, so I made an example for Spread. Please note that in my example you are getting warning messages as the Species column is a factor.

+2

jkt Jun 15 '16 at 17:33

source share

A few tips: I would not name things that you have with the same name as the built-in functions ( model , formula come to your mind in your initial version).

Alternatively, you can simplify the paste you make - see below.

Finally, a more general statement: do not feel that everything should be done in *apply . Sometimes brevity and short code are actually harder to understand, and remember that *apply functions offer at best the speed limit that results from a simple for loop. (This was not always the case with R , but just at that moment).

 # Get pairs myPairs <- combn(x = names(x = iris[1:4]),m = 2) # Just directly use paste() here myFormulas <- paste(myPairs[1,],myPairs[2,],sep = "~") # Store the models themselves into a list # This lets you go back to the models later if you need something else myModels <- lapply(X = myFormulas,FUN = lm,data = iris) # If you use sapply() and this simple function, you get back a named vector # This seems like it could be useful to what you want to do myCoeffs <- sapply(X = myModels,FUN = function (x) {return(x$coefficients[2])}) # Now, you can do this using vectorized operations iris[myPairs[1,]] - iris[myPairs[2,]] * myCoeffs[myPairs[2,]]

If I understand correctly, I believe that the above will work. Note that the output names will currently be meaningless, you will need to replace them with something of your own design (perhaps myFormulas values).

+1

Tarhman Jun 15 '16 at 17:50

source share

user20650 · Accepted Answer · 2016-06-15T18:44:12+0000

You can do it all in combn .

If you just want to run a regression on all combinations and extract the second coefficient, which you could do

 fun <- function(x) coef(lm(paste(x, collapse="~"), data=iris))[2] combn(names(iris[1:4]), 2, fun)

Then you can expand the function to calculate the spread

 fun <- function(x) { est <- coef(lm(paste(x, collapse="~"), data=iris))[2] spread <- iris[,x[1]] - est*iris[,x[2]] adf.test(spread) } out <- combn(names(iris[1:4]), 2, fun, simplify=FALSE) out[[1]] # Augmented Dickey-Fuller Test #data: spread #Dickey-Fuller = -3.879, Lag order = 5, p-value = 0.01707 #alternative hypothesis: stationary

Compare results to run first manually

 est <- coef(lm(Sepal.Length ~ Sepal.Width, data=iris))[2] spread <- iris[,"Sepal.Length"] - est*iris[,"Sepal.Width"] adf.test(spread) # Augmented Dickey-Fuller Test # data: spread # Dickey-Fuller = -3.879, Lag order = 5, p-value = 0.01707 # alternative hypothesis: stationary

R: how to do more complex calculations from the comb of a dataset?

More articles: