Find the matching string of the data frame R and create an iteration from tuples

I have an R data frame with two columns. Column x categorical and column y is continuous. Here is an example:

 library(dplyr) x <- c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,4,4,4,4,4,4,4,4,4,4) y <- runif(length(x), 0, 1) df <- data.frame(x,x) df_sum <- df %>% group_by(x) %>% summarise(count = n()) 

Think of each categorical value, such as the identifier of a series of a particular type and y, as the values ​​in this series. In the end, I want to be able to compare the selected subset of all possible series using the my_func() function.

First, I need to define "good" tuples and create interable for use in the second part of the task.

To find "good" tuples, I need to compare the number of rows for each categorical value of x in df_sum . I want to find all combinations of categorical values ​​of x , where the ratio of the number of observations is between 0.9 and 1.5.

For example, x_1=7 and x_2=5 , and x_1/x_2=1.4 falls into this range. Therefore, I want to save the tuple (1,2) .

my_func(s1,s2)=my_func(s2,s1)

Therefore, I do not need to save (2,1) if I already have (1,2) . Once I have all the good tuples, I want to scroll through them and run the function my_func(s1, s2) and save (s1, s2, my_func(s1,s2)) in the data frame.

If good_tuples were a Python-like list [(1,2),...] , I would write pseudocode, for example:

 for tuple in good_tuples: s1 <- df[df$x==tuple[0],'y'] s2 <- df[df$x==tuple[1],'y'] my_func(s1, s2) 

Ideally, I could run the loop in parallel with something like mapply.

+1
source share
1 answer

You can try this solution:

 z <- melt(tcrossprod(df_sum$count,1/df_sum$count)) # X1 X2 value # 1 1 1 1.0000000 # 2 2 1 0.7142857 # 3 3 1 0.2857143 # 4 4 1 1.4285714 pairs <- subset(z[1:2],z$value>1.0 & z$value <= 1.5) # X1 X2 # 4 4 1 # 5 1 2 mapply(sum,pairs$X1,pairs$X2) # for example, calculate sum # [1] 5 3 
+1
source

Source: https://habr.com/ru/post/1212066/


All Articles