I am trying to create a data validation report in R; I used the validate package to generate a common summary, but I need to find out what the validation fails.
What I want to get is a data frame from identifiers, columns that do not perform their test, and a value that does not pass the test. However, not all columns are required, so I need to be able to check if the data passes without knowing if the column will be there.
For other data frames with required data, I converted it to True / False, whether it passed the tests. For instance:
library(dplyr)
library(validate)
library(tidyr)
test_df = data.frame(id = 1:10,
a = 11:20,
b = c(21:25,36,27:30),
c = c(41,52,43:50))
text_check = test_df %>% transmute(
a = a>21,
b = b > 31,
c = c> 51
)
value_fails<-data.frame(id = test_df$id, text_check[,-1][colSums(text_check[,-1]) > 0])
value_failures_gath = gather(value_fails, column, changed, -id) %>% filter(changed == TRUE)
value_failures_gath$Value = apply(value_failures_gath, c(1), function(x)
test_df[test_df$id == x[['id']], grep(x[['column']], colnames(test_df))])
value_failures_gath<-value_failures_gath %>% arrange(id, column)
value_failures_gath$changed<-NULL
colnames(value_failures_gath)<-c('ID','Field','Value')
> value_failures_gath
ID Field Value
1 2 c 52
2 6 b 36
I have a data frame with the checks I want to create, in the style of:
second_data_check = data.frame(a = 'a>21',
b = 'b > 31',
c = 'c> 51',
d = 'd> 61')
, D , , , D, B, . , , , ? ?
!