Dplyr arr () function sorts by missing values

Question

Dplyr arr () function sorts by missing values

I am trying to work through Hadley Wickham R for Data Science and got the answer to the following question: "How could you use the arr () function to sort all the missing values before the start? (Hint: use is.na ())" I use a set of flights included in nycflights13 package. Given that arrangement () sorts all unknown values at the bottom of the data framework, I'm not sure how to do the opposite for the missing values of all variables. I understand that this question can be answered with basic R-code, but I am particularly interested in how this will be done using dplyr and calling the arr () and is.na () functions. Thanks.

+5

sorting r dplyr na

T. gross Jun 11 '16 at 6:11

source share

4 answers

The following orders the rows in descending order by their number NA s:

 flights %>% arrange(desc(rowSums(is.na(.)))) # A tibble: 336,776 × 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time <int> <int> <int> <int> <int> <dbl> <int> <int> 1 2013 1 2 NA 1545 NA NA 1910 2 2013 1 2 NA 1601 NA NA 1735 3 2013 1 3 NA 857 NA NA 1209 4 2013 1 3 NA 645 NA NA 952 5 2013 1 4 NA 845 NA NA 1015 6 2013 1 4 NA 1830 NA NA 2044 7 2013 1 5 NA 840 NA NA 1001 8 2013 1 7 NA 820 NA NA 958 9 2013 1 8 NA 1645 NA NA 1838 10 2013 1 9 NA 755 NA NA 1012 # ... with 336,766 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>, # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

+1

Frederick solt Oct 9 '16 at 17:03

source share

Try the easiest way that it just showed you:

 arrange(flights, desc(is.na(dep_time)))

Other useful shortcuts:

 arrange(flights, !is.na(dep_time))

or

 arrange(flights, -is.na(dep_time))

+1

Arkadiusz choczaj Aug 24 '17 at 11:01

source share

@Akrun's solution works fine. However, arrange_ are obsolete versions of the main SE verbs. to avoid this we can use eval

 nmf <- names(flights)[colSums(is.na(flights)) > 0] rules = paste0("!is.na(", nmf, ")") rc <- paste(rules, collapse = ",") arce <- paste("arrange(flights," , rc , ")") expr <- parse(text = arce) ret <- eval(expr)

0

Endle_zhenbo Sep 11 '17 at 17:02

source share

akrun · Accepted Answer · 2016-06-11T06:23:22+0000

We can wrap it desc to get the missing values at the beginning

 flights %>% arrange(desc(is.na(dep_time)), desc(is.na(dep_delay)), desc(is.na(arr_time)), desc(is.na(arr_delay)), desc(is.na(tailnum)), desc(is.na(air_time)))

NA values were found only in these variables based on

 names(flights)[colSums(is.na(flights)) >0] #[1] "dep_time" "dep_delay" "arr_time" "arr_delay" "tailnum" "air_time"

Instead of passing each variable name at a time, we can also use NSE arrange_

 nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))") r1 <- flights %>% arrange_(.dots = nm1) r1 %>% head() #year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum # <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> #1 2013 1 2 NA 1545 NA NA 1910 NA AA 133 <NA> #2 2013 1 2 NA 1601 NA NA 1735 NA UA 623 <NA> #3 2013 1 3 NA 857 NA NA 1209 NA UA 714 <NA> #4 2013 1 3 NA 645 NA NA 952 NA UA 719 <NA> #5 2013 1 4 NA 845 NA NA 1015 NA 9E 3405 <NA> #6 2013 1 4 NA 1830 NA NA 2044 NA 9E 3716 <NA> #Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, # time_hour <time>.

Update

In newer versions of tidyverse ( dplyr_0.7.3 , rlang_0.1.2 ) we can also use arrange_at , arrange_all , arrange_if

 nm1 <- names(flights)[colSums(is.na(flights)) >0] r2 <- flights %>% arrange_at(vars(nm1), funs(desc(is.na(.))))

Or use arrange_if

 f <- rlang::as_function(~ any(is.na(.))) r3 <- flights %>% arrange_if(f, funs(desc(is.na(.)))) identical(r1, r2) #[1] TRUE identical(r1, r3) #[1] TRUE

Dplyr arr () function sorts by missing values

Update

More articles: