Several R packages have been developed that allow you to make statements about your data, analysis results, etc. However, I have never seen a compilation of a list of useful checks.
Are there any resources that have checklists or other general checklists?
For example, if you analyze the polling data, you can check the data operability as follows:
- Impossible values: Someone who lists the medical profession is 6 years old.
- Incredible correlations: educational level has a negative correlation with profit
After doing a lot of joins, you want to check the final data structure:
- Lost observations: the data set begins with N = 100,000 ... after adding variables, N is 100,000?
- Unreasonable column values: summary of zeros, outlier detection, distribution of the most common values
- Unreasonable relationship between columns: a table with a seller by sales, but the seller identifier does not exist in the seller's table.
After developing the forecasts, you want to check whether they make sense:
- Incredible group forecasts: you predict average predictions of the likelihood of a group making a purchase and find that pet owners are more likely than pet owners to buy pet food more often.
etc .. and others.
Below are some R packages that will help include such tests in R ... if we had a checklist of what those tests should be!
testthat
http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf
https://github.com/hadley/testthat
RUnit
http://cran.r-project.org/web/packages/RUnit/vignettes/RUnit.pdf
Svunit
http://cran.r-project.org/web/packages/svUnit/vignettes/svUnit.pdf