Use dplyr to filter columns containing characters

Question

Use dplyr to filter columns containing characters

I have a large data framework that I would like to use the excellent dplyr package (Wickham) that I recently discovered. I would like to filter out columns containing characters. Is it possible?

For example, in the flights sets in the nycflights13 package, how can I filter out columns with class character ?

 library(nycflights13) data(flights) str(flights) Classes 'tbl_df', 'tbl' and 'data.frame': 336776 obs. of 16 variables: $ year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ... $ month : int 1 1 1 1 1 1 1 1 1 1 ... $ day : int 1 1 1 1 1 1 1 1 1 1 ... $ dep_time : int 517 533 542 544 554 554 555 557 557 558 ... $ dep_delay: num 2 4 2 -1 -6 -4 -5 -3 -3 -2 ... $ arr_time : int 830 850 923 1004 812 740 913 709 838 753 ... $ arr_delay: num 11 20 33 -18 -25 12 19 -14 -8 8 ... $ carrier : chr "UA" "UA" "AA" "B6" ... $ tailnum : chr "N14228" "N24211" "N619AA" "N804JB" ... $ flight : int 1545 1714 1141 725 461 1696 507 5708 79 301 ... $ origin : chr "EWR" "LGA" "JFK" "JFK" ... $ dest : chr "IAH" "IAH" "MIA" "BQN" ... $ air_time : num 227 227 160 183 116 150 158 53 140 138 ... $ distance : num 1400 1416 1089 1576 762 ... $ hour : num 5 5 5 5 5 5 5 5 5 5 ... $ minute : num 17 33 42 44 54 54 55 57 57 58 ...

Any ideas?

+9

r dplyr

jonas Dec 04 '14 at 8:32

source share

5 answers

akrun · Answer 1 · 2014-12-04T08:48:05+0000

You can try dplyr from dplyr

 library(dplyr) indx <- which(unlist(summarise_each(flights, funs(class))!='character')) flights %>% select(indx)

Tim · Answer 2 · 2014-12-04T09:35:17+0000

For this you do not need dplyr , you can use the R base:

 flights[, !sapply(flights, is.character)]

jbaums · Answer 3 · 2014-12-04T08:39:32+0000

I don't think there is a dplyr shortcut for this, but you can get what you need:

 flights %>% select(which(sapply(flights, class) != 'character')) # Source: local data frame [336,776 x 12] # # year month day dep_time dep_delay arr_time arr_delay flight air_time distance hour minute # 1 2013 1 1 517 2 830 11 1545 227 1400 5 17 # 2 2013 1 1 533 4 850 20 1714 227 1416 5 33 # 3 2013 1 1 542 2 923 33 1141 160 1089 5 42 # 4 2013 1 1 544 -1 1004 -18 725 183 1576 5 44 # 5 2013 1 1 554 -6 812 -25 461 116 762 5 54 # 6 2013 1 1 554 -4 740 12 1696 150 719 5 54 # 7 2013 1 1 555 -5 913 19 507 158 1065 5 55 # 8 2013 1 1 557 -3 709 -14 5708 53 229 5 57 # 9 2013 1 1 557 -3 838 -8 79 140 944 5 57 # 10 2013 1 1 558 -2 753 8 301 138 733 5 58 # .. ... ... ... ... ... ... ... ... ... ... ... ...

Rich scriven · Answer 4 · 2014-12-04T09:22:45+0000

I do not have flight data, but this method also works with some other data that I experimented with

 do(flights, Filter(Negate(is.character), .))

Of course, base R always exists. For this task it seems a little easier.

 Filter(Negate(is.character), flights)

sbha · Answer 5 · 2019-08-14T13:59:35+0000

Here is the dplyr / tidyverse option using select_if() (using the select_if() star wars example data):

 starwars %>% select_if(~!is.character(.)) %>% head(2) # A tibble: 2 x 6 height mass birth_year films vehicles starships <int> <dbl> <dbl> <list> <list> <list> 1 172 77 19 <chr [5]> <chr [2]> <chr [2]> 2 167 75 112 <chr [6]> <chr [0]> <chr [0]>

Use dplyr to filter columns containing characters

More articles: