Dplyr package: How can I query a large data frame using the SQL syntax "% xyz%"?

Question

Dplyr package: How can I query a large data frame using the SQL syntax "% xyz%"?

dplyr is the only package that can process my 843k data.frame and request it in a quick way. I can filter the fine using some mathematical and equal criteria, however I need to implement a concept search.

I need something like this sqldf query

library(sqldf)
head(iris)
sqldf("select * from iris where lower(Species) like '%nica%'")

In dplyr help, I could not find how I could do this. sort of:

filter(iris,Species like '%something%')

Starting and ending% is very important. Also note that the data frame has 800 + k rows, so traditional R functions may work slowly. He must use dplyr solution.

+2

r dplyr

userJT Jul 22 '14 at 16:13

source share

2 answers

( )

require(data.table)
iris %>% filter(tolower(Species) %like% 'nica')

+2

userJT 05 . '15 12:48

nrussell · Accepted Answer · 2014-07-22T16:28:54+0000

How about this -

library(dplyr)
data(iris)
filter(iris, grepl("nica",Species))

EDIT: - %like% data.table()

library(dplyr)
data(iris)
##
Iris <- iris[
  rep(seq_len(nrow(iris)),each=5000),
  ]
dim(Iris)
[1] 750000      5
##
library(microbenchmark)
library(data.table)
##
Dt <- data.table(Iris)
setkeyv(Dt,cols="Species")
##
foo <- function(){
  subI <- filter(Iris, grepl("nica",Species))
}
##
foo2 <- function(){
  subI <- Dt[Species %like% "nica"]
}
##
foo3 <- function(){
  subI <- filter(Iris, Species %like% "nica")
}
Res <- microbenchmark(
  foo(),foo2(),foo3(),
  times=100L)
##
> Res
Unit: milliseconds
   expr       min        lq    median        uq      max neval
  foo() 114.31080 122.12303 131.15523 136.33254 214.0405   100
 foo2()  23.00508  30.33685  39.77843  41.49121 129.9125   100
 foo3()  18.84933  22.47958  29.39228  35.96649 114.4389   100

Dplyr package: How can I query a large data frame using the SQL syntax "% xyz%"?

More articles: