I have 10 frames of data with two columns each, I call dataframes a, b, c, d, e, f, g, h, i and j.
The first column in each data frame is called s for sequences, and the second is p for p-values corresponding to each sequence. Column s contains the same sequences in all 10 data frames, essentially the only difference in p-values. Below is a short version of data frame a, which has 600,000 rows.
sp gtcg 0.06 gtcgg 0.05 gggaa 0.07 cttg 0.05
I want to rank each data frame by p-value, the smallest p should get rank 1, and equal p-values should get the same rank. Each leaf data frame should be in this format:
s p_rank_a gtcg 2 gtcgg 1 gggaa 3 cttg 1
I used this to do this:
r <-rang (a $ p)
cbind (a $ s, g)
but I am not very familiar with loops, and I don’t know how to do this automatically. In the end, I would like the last file to have a column s and in the next column the sum of the ranks of all ranks in all data frames for each particular sequence. SO basically this:
s ranksum_P_a-j gtcg 34 gtcgg 5 gggaa 5009093 cttg 499
Please help and thanks!
source share