R checks a pair of rows in a data frame

I have data containing information on parameters such as

> chData myIdx strike_price date exdate cp_flag strike_price return 1 8355342 605000 1996-04-02 1996-05-18 P 605000 0.002340 2 8355433 605000 1996-04-02 1996-05-18 C 605000 0.002340 3 8356541 605000 1996-04-09 1996-05-18 P 605000 -0.003182 4 8356629 605000 1996-04-09 1996-05-18 C 605000 -0.003182 5 8358033 605000 1996-04-16 1996-05-18 P 605000 0.003907 6 8358119 605000 1996-04-16 1996-05-18 C 605000 0.003907 7 8359391 605000 1996-04-23 1996-05-18 P 605000 0.005695 

where cp_flag means that the specific option is either a call or put. How can I make sure that for each date there is both a call and put, and discard lines for which it does not exist? I can do this with a for loop, but is there a smarter way?

+4
source share
4 answers

Get dates that have P and those that have C, and use intersect to find dates that have both.

 keep_dates <- with(x, intersect(date[cp_flag=='P'], date[cp_flag=='C']) ) # "1996-04-02" "1996-04-09" "1996-04-16" 

Keep only strings that have dates displayed in keep_dates.

 x[ x$date %in% keep_dates, ] # myIdx strike_price date exdate cp_flag strike_price.1 # 8355342 605000 1996-04-02 1996-05-18 P 605000 # 8355433 605000 1996-04-02 1996-05-18 C 605000 # 8356541 605000 1996-04-09 1996-05-18 P 605000 # 8356629 605000 1996-04-09 1996-05-18 C 605000 # 8358033 605000 1996-04-16 1996-05-18 P 605000 # 8358119 605000 1996-04-16 1996-05-18 C 605000 
+10
source

Using plyr package:

 > ddply(chData, "date", function(x) if(all(c("P","C") %in% x$cp_flag)) x) myIdx strike_price date exdate cp_flag strike_price.1 return 1 8355342 605000 1996-04-02 1996-05-18 P 605000 0.002340 2 8355433 605000 1996-04-02 1996-05-18 C 605000 0.002340 3 8356541 605000 1996-04-09 1996-05-18 P 605000 -0.003182 4 8356629 605000 1996-04-09 1996-05-18 C 605000 -0.003182 5 8358033 605000 1996-04-16 1996-05-18 P 605000 0.003907 6 8358119 605000 1996-04-16 1996-05-18 C 605000 0.003907 
+1
source

Here is a reshape .

 library(reshape) #Add a dummy value df$value <- 1 check <- cast(df, myIdx + strike_price + date + exdate + strike_price + return ~ cp_flag) #take stock of what just happened summary(check) #use only complete cases. If you have NAs elsewhere, this will knock out those obs too check <- check[complete.cases(check),] #back to original form df.clean <- melt(check, id = 1:6) 
+1
source

Here is one way: split and lapply :

 > tmp <- lapply(split(d, list(d$date)), function(x) if(all(c('P', 'C') %in% x[, 5])) x) > do.call(rbind, tmp) myIdx strike_price date exdate cp_flag strike_price return 1996-05-18.1 8355342 605000 1996-04-02 1996-05-18 P 605000 0.002340 1996-05-18.2 8355433 605000 1996-04-02 1996-05-18 C 605000 0.002340 1996-05-18.3 8356541 605000 1996-04-09 1996-05-18 P 605000 -0.003182 1996-05-18.4 8356629 605000 1996-04-09 1996-05-18 C 605000 -0.003182 1996-05-18.5 8358033 605000 1996-04-16 1996-05-18 P 605000 0.003907 1996-05-18.6 8358119 605000 1996-04-16 1996-05-18 C 605000 0.003907 1996-05-18.7 8359391 605000 1996-04-23 1996-05-18 P 605000 0.005695 

Edit: here the full version is implied by my last answer. I tend to think in basic functions, not plyr or reshape ... but these answers seem good too.

0
source

All Articles