I am trying to count the number of unique drugs on this list.
my_drugs=c('a', 'b', 'd', 'h', 'q')
I have the following dictionary that gives me synonyms for drugs, but it is not configured so that the definitions are only for unique drugs:
dictionary <- read.table(header=TRUE, text=" drug names ab;c;d;x xb;c;q rh;g;f lm;n ")
Thus, in this case there are 2 unique drugs on the list (because a, directly or indirectly, has synonyms b, d, q). Synonyms of synonyms are considered synonyms.
My attempt is to first make a dictionary that had only unique drugs on the left side. To do this, I would cycle through the $ drug dictionary, grep in the $ drug dictionary and dictionary $ synonyms, take the union of these and replace the $$ synonyms, and then delete other lines from the dictionary.
bigdf=dictionary small_df=data.frame("drug"=NA,"names"=NA) for(i in 1:nrow(bigdf)){ search_term=sprintf("*%s*",bigdf$drug[i]) index=grep(search_term,bigdf$names) list=bigdf$names[index] list=Reduce(union,list) list=paste(list, collapse=";") if(!list==""){ new_row=data.frame("drug"=bigdf$drug[index][1],"names"=list) small_df=rbind(small_df,new_row) #small_df bigdf=bigdf[-index,] #dim(bigdf) } else{ new_row=data.frame("drug"=bigdf$drug[index][1],"names"="alreadycounted") small_df=rbind(small_df,new_row) } }
It didn’t work (some drugs were missing from small_df), and even if I hadn’t been sure how I would use my new dictionary to count the number of unique drugs on my list.
How can I count the number of unique drugs in my_drugs?
Thanks for the help, and let me know if this requires further clarification.
Dataset size: 200 items in my_drugs, 2000 lines in the dictionary, each drug has 10-12 synonyms.