Select rows in which the column has a row of type "hsa .." (matching a partial row)

I have a 371 MB text file containing micro RNA data. In fact, I would like to single out only those lines that contain information about human microRNA.

I read in the file using read.table. Normally, I would do what I would like with sqldf - if it had a "how" syntax (select * from <> where miRNA is like "hsa"). Unfortunately, sqldf does not support this syntax.

How can I do this in R? I looked through stackoverflow and see no example of how I can perform partial string matching . I even installed the stringr package, but it doesn’t have exactly what I need.

What I would like to do is something like this: where all the rows are selected, where hsa- *.

selectedRows <- conservedData[, conservedData$miRNA %like% "hsa-"] 

which of course is the wrong syntax.

Can someone please help me with this? Thanks so much for reading.

Asda

+56
string r match
Oct 24 '12 at 6:25
source share
3 answers

I noticed that you will mention the %like% function in your current approach. I don't know if this link is to %like% from "data.table", but if so, you can definitely use it like this.

Note that the object should not be data.table (but also remember that the subset approaches for data.frame and data.table not identical):

 library(data.table) mtcars[rownames(mtcars) %like% "Merc", ] iris[iris$Species %like% "osa", ] 

If this is what you had, then maybe you just mixed the positions of the rows and columns for a subset of the data.




If you do not want to download the package, you can try using grep() to find a suitable string. Here is an example with the mtcars , where we match all the rows in which row names include "Merc":

 mtcars[grep("Merc", rownames(mtcars)), ] mpg cyl disp hp drat wt qsec vs am gear carb # Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 # Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 # Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 # Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 # Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 # Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 # Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3 

And another example using the iris dataset that looks for the osa string:

 irisSubset <- iris[grep("osa", iris$Species), ] head(irisSubset) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa 

For your problem try:

 selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ] 
+89
Oct 24
source share

Try str_detect() from a stringr package that detects the presence or absence of a pattern in a string.

Here is an approach that also includes the %>% pipe and filter() from the dplyr package:

 library(stringr) library(dplyr) CO2 %>% filter(str_detect(Treatment, "non")) Plant Type Treatment conc uptake 1 Qn1 Quebec nonchilled 95 16.0 2 Qn1 Quebec nonchilled 175 30.4 3 Qn1 Quebec nonchilled 250 34.8 4 Qn1 Quebec nonchilled 350 37.2 5 Qn1 Quebec nonchilled 500 35.3 ... 

This filters a sample of CO2 data (which comes with R) for rows where the processing variable contains the substring "non". You can configure whether str_detect fixed matches or uses a regular expression - see the documentation for the stringr package.

+32
Jul 07 '15 at 15:57
source share

LIKE should work in sqlite:

 require(sqldf) df <- data.frame(name = c('bob','robert','peter'),id=c(1,2,3)) sqldf("select * from df where name LIKE '%er%'") name id 1 robert 2 2 peter 3 
+17
Oct 24
source share



All Articles