This question may have been answered earlier, but I did not see the answer.
I have a data set consisting of numbers and missing values. One row is a percentage. The following is a small set of fake data in which the column names are AA, BB, and CC. The third row in this dataset is the percentage.
AA BB CC 234 432 78 1980 3452 2323 91.1 90 93.3 34 123 45
In this case, when I read the dataset, AA and CC are numeric, and BB is an integer. I guess somewhere around 90.0 was rounded to 90. If I don't indicate that BB is numeric, can this cause problems with basic arithmetic?
I believe that if dd = 1 and ee = 2, and both are integers, then the C-language says dd / ee = 0, and R says dd / ee = 0.5.
The following is a series of simple mathematical operations that seem to suggest that the answers in R do not change regardless of whether the data is numeric or integer. However, I continue to think that it would be prudent to indicate that all variables are numeric when reading data. Using Google, I found an example or two where the data type really mattered, but not lower.
aa <- c(1,2,3,4,5,6,7) bb <- 2 str(aa) str(bb) cc <- as.integer(aa) dd <- as.integer(bb) str(cc) str(dd) aa/bb cc/dd aa/dd cc/bb ee <- aa * aa str(ee) sum(ee/2) ff <- cc * cc str(ff) sum(ff/2) gg <- 4.14 hh <- ((aa * aa) * gg) / 2 hh ii <- ((cc * cc) * gg) / 2 ii jj <- (aa * aa) / gg jj kk <- (cc * cc) / gg kk jj == kk mm <- as.integer(1) nn <- as.integer(2) mm/nn
I think I hope for confidence that this is hardly a problem with simple math, but I suspect that this is possible. I keep thinking that there is a basic programming rule here, but I'm not sure what it is. (I know the concept of double precision.)
Thanks for any tips that are definitely the main issue.