The xlsx package incorrectly indicates read dates. I read the entire top, similar to Q, and had a scout around the Internet, but I cannot find this specific behavior when the origin changes if there is non-data in the column.
I have a tiny Excel spreadsheet that you can get from dropbox:
https://www.dropbox.com/s/872q9mzb5uzukws/test.xlsx
It has three rows, two columns. The first is the date, the second is the number. The third row contains "Grand Total" in the date column.
If I read in the first two lines with read.xlsx and say that the first column is the date, then this works:
read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer"),endRow=2) X1 X2 1 2014-06-29 49 2 2014-06-30 46
These are indeed dates in a spreadsheet. If I try to read all three lines, something will go wrong:
read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer")) X1 X2 1 2084-06-30 49 2 2084-07-01 46 3 <NA> 89251 Warning message: In as.POSIXlt.Date(x) : NAs introduced by coercion
If I try to read as integers, I get different integers:
> read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer"),endRow=2) X1 X2 1 16250 49 2 16251 46 > read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer")) X1 X2 1 41819 49 2 41820 46 3 NA 89251
The first integers are converted correctly using as.Date(s1$X1,origin="1970-01-01") (Unix era), and the integers are converted correctly using as.Date(s2$X1, origin="1899-12-30") (Excel epoch). If I convert the second batch using 1970, I get the dates of 2084.
So: Am I doing something wrong? It is best to read as integers, and if any NS, and then convert using the Excel era, otherwise use the Unix era? Or is this an error in the xlsx package?
Xlsx Version - Version: 0.5.1