Extracting information from a conditional formula

I would like to write a function R that takes a formula as its first argument, similar to lm () or glm () and friends. In this case, it is a function that takes a data frame and writes a file in SVMLight format, which has this general form:

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info> <target> .=. +1 | -1 | 0 | <float> <feature> .=. <integer> | "qid" <value> .=. <float> <info> .=. <string> 

For example, the following data frame:

  result qid f1 f2 f3 f4 f5 f6 f7 f8 1 -1 1 0.0000 0.1253 0.0000 0.1017 0.00 0.0000 0.0000 0.9999 2 -1 1 0.0098 0.0000 0.0000 0.0000 0.00 0.0316 0.0000 0.3661 3 1 1 0.0000 0.0000 0.1941 0.0000 0.00 0.0000 0.0509 0.0000 4 -1 2 0.0000 0.2863 0.0948 0.0000 0.34 0.0000 0.7428 0.0608 5 1 2 0.0000 0.0000 0.0000 0.4347 0.00 0.0000 0.9539 0.0000 6 1 2 0.0000 0.7282 0.9087 0.0000 0.00 0.0000 0.0000 0.0355 

will be presented as follows:

 -1 qid:1 2:0.1253 4:0.1017 8:0.9999 -1 qid:1 1:0.0098 6:0.0316 8:0.3661 1 qid:1 3:0.1941 7:0.0509 -1 qid:2 2:0.2863 3:0.0948 5:0.3400 7:0.7428 8:0.0608 1 qid:2 4:0.4347 7:0.9539 1 qid:2 2:0.7282 3:0.9087 8:0.0355 

The function that I would like to write will be called something like this:

 write.svmlight(result ~ f1+f2+f3+f4+f5+f6+f7+f8 | qid, data=mydata, file="out.txt") 

Or even

 write.svmlight(result ~ . | qid, data=mydata, file="out.txt") 

But I cannot figure out how to use model.matrix() and / or model.frame() to find out which columns it should write. Are these the right things to look at?

Any help is much appreciated!

+7
r formula
source share
2 answers

Partial answer. You can index the formula object to get the formula syntax tree:

 > f<-a~b+c|d > f[[1]] `~` > f[[2]] a > f[[3]] b + c | d > f[[3]][[1]] `|` > f[[3]][[2]] b + c > f[[3]][[3]] d 

Now you just need the code for this tree.

UPDATE: Here is an example of a function that moves a tree.

 walker<-function(formu){ if (!is(formu,"formula")) stop("Want formula") lhs <- formu[[2]] formu <- formu[[3]] if (formu[[1]]!='|') stop("Want conditional part") condi <- formu[[3]] flattener <- function(f) {if (length(f)<3) return(f); c(Recall(f[[2]]),Recall(f[[3]]))} vars <- flattener(formu[[2]]) list(lhs=lhs,condi=condi,vars=vars) } walker(y~a+b|c) 

Also see the documentation for terms.formula and terms.object . Examining the code for some functions that accept conditional formulas may help, for example. lmer in lmer package.

+4
source share

I used

 formu.names <- all.vars(formu) Y.name <- formu.names[1] X.name <- formu.names[2] block.name <- formu.names[3] 

In the code I wrote about doing post-hoc for the friedman test:

http://www.r-statistics.com/2010/02/post-hoc-analysis-for-friedmans-test-r-code/

But it will only work for: Y`X | block

I hope for the best answer that others will give.

0
source share

All Articles