Dplyr arr () function sorts by missing values

I am trying to work through Hadley Wickham R for Data Science and got the answer to the following question: "How could you use the arr () function to sort all the missing values ​​before the start? (Hint: use is.na ())" I use a set of flights included in nycflights13 package. Given that arrangement () sorts all unknown values ​​at the bottom of the data framework, I'm not sure how to do the opposite for the missing values ​​of all variables. I understand that this question can be answered with basic R-code, but I am particularly interested in how this will be done using dplyr and calling the arr () and is.na () functions. Thanks.

+5
source share
4 answers

We can wrap it desc to get the missing values ​​at the beginning

 flights %>% arrange(desc(is.na(dep_time)), desc(is.na(dep_delay)), desc(is.na(arr_time)), desc(is.na(arr_delay)), desc(is.na(tailnum)), desc(is.na(air_time))) 

NA values ​​were found only in these variables based on

 names(flights)[colSums(is.na(flights)) >0] #[1] "dep_time" "dep_delay" "arr_time" "arr_delay" "tailnum" "air_time" 

Instead of passing each variable name at a time, we can also use NSE arrange_

 nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))") r1 <- flights %>% arrange_(.dots = nm1) r1 %>% head() #year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum # <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> #1 2013 1 2 NA 1545 NA NA 1910 NA AA 133 <NA> #2 2013 1 2 NA 1601 NA NA 1735 NA UA 623 <NA> #3 2013 1 3 NA 857 NA NA 1209 NA UA 714 <NA> #4 2013 1 3 NA 645 NA NA 952 NA UA 719 <NA> #5 2013 1 4 NA 845 NA NA 1015 NA 9E 3405 <NA> #6 2013 1 4 NA 1830 NA NA 2044 NA 9E 3716 <NA> #Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, # time_hour <time>. 

Update

In newer versions of tidyverse ( dplyr_0.7.3 , rlang_0.1.2 ) we can also use arrange_at , arrange_all , arrange_if

 nm1 <- names(flights)[colSums(is.na(flights)) >0] r2 <- flights %>% arrange_at(vars(nm1), funs(desc(is.na(.)))) 

Or use arrange_if

 f <- rlang::as_function(~ any(is.na(.))) r3 <- flights %>% arrange_if(f, funs(desc(is.na(.)))) identical(r1, r2) #[1] TRUE identical(r1, r3) #[1] TRUE 
+5
source

The following orders the rows in descending order by their number NA s:

 flights %>% arrange(desc(rowSums(is.na(.)))) # A tibble: 336,776 Γ— 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time <int> <int> <int> <int> <int> <dbl> <int> <int> 1 2013 1 2 NA 1545 NA NA 1910 2 2013 1 2 NA 1601 NA NA 1735 3 2013 1 3 NA 857 NA NA 1209 4 2013 1 3 NA 645 NA NA 952 5 2013 1 4 NA 845 NA NA 1015 6 2013 1 4 NA 1830 NA NA 2044 7 2013 1 5 NA 840 NA NA 1001 8 2013 1 7 NA 820 NA NA 958 9 2013 1 8 NA 1645 NA NA 1838 10 2013 1 9 NA 755 NA NA 1012 # ... with 336,766 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>, # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> 
+1
source

Try the easiest way that it just showed you:

 arrange(flights, desc(is.na(dep_time))) 

Other useful shortcuts:

 arrange(flights, !is.na(dep_time)) 

or

 arrange(flights, -is.na(dep_time)) 
+1
source

@Akrun's solution works fine. However, arrange_ are obsolete versions of the main SE verbs. to avoid this we can use eval

 nmf <- names(flights)[colSums(is.na(flights)) > 0] rules = paste0("!is.na(", nmf, ")") rc <- paste(rules, collapse = ",") arce <- paste("arrange(flights," , rc , ")") expr <- parse(text = arce) ret <- eval(expr) 
0
source

All Articles