R Data validation only if a column exists

Question

R Data validation only if a column exists

I am trying to create a data validation report in R; I used the validate package to generate a common summary, but I need to find out what the validation fails.

What I want to get is a data frame from identifiers, columns that do not perform their test, and a value that does not pass the test. However, not all columns are required, so I need to be able to check if the data passes without knowing if the column will be there.

For other data frames with required data, I converted it to True / False, whether it passed the tests. For instance:

library(dplyr)
library(validate)
library(tidyr)

test_df = data.frame(id = 1:10, 
                 a = 11:20, 
                 b = c(21:25,36,27:30), 
                 c = c(41,52,43:50))

text_check = test_df %>% transmute(
      a = a>21,
      b = b > 31,
      c = c> 51
)

value_fails<-data.frame(id = test_df$id, text_check[,-1][colSums(text_check[,-1]) > 0])

value_failures_gath = gather(value_fails, column, changed, -id) %>% filter(changed == TRUE)
value_failures_gath$Value = apply(value_failures_gath, c(1), function(x) 
              test_df[test_df$id == x[['id']], grep(x[['column']], colnames(test_df))])
value_failures_gath<-value_failures_gath %>% arrange(id, column)
value_failures_gath$changed<-NULL

colnames(value_failures_gath)<-c('ID','Field','Value')

> value_failures_gath
  ID Field Value
1  2     c    52
2  6     b    36

I have a data frame with the checks I want to create, in the style of:

second_data_check = data.frame(a = 'a>21',
                           b = 'b > 31',
                           c = 'c> 51',
                           d = 'd> 61')

, D , , , D, B, . , , , ? ?

!

+4

validation r filtering mutation

user1775563 23 . '16 15:36

1

WhaterFalls · Accepted Answer · 2016-06-24T18:23:35+0000

, . ?

text_check = data.frame(id=test_df$id)

if('a' %in% colnames(test_df)){
  text_check_temp = test_df %>% transmute(a=a>21)
  text_check <- cbind(text_check, text_check_temp)
}
if('b' %in% colnames(test_df)){
  text_check_temp = test_df %>% transmute(b=b>31)
  text_check <- cbind(text_check, text_check_temp)
}
if('c' %in% colnames(test_df)){
  text_check_temp = test_df %>% transmute(c=c>51)
  text_check <- cbind(text_check, text_check_temp)
}
if('d' %in% colnames(test_df)){
  text_check_temp = test_df %>% transmute(d=d>61)
  text_check <- cbind(text_check, text_check_temp)
}

, , , .

R Data validation only if a column exists

More articles: