Reading the .xlsx read date is incorrect if not specified in the column

The xlsx package incorrectly indicates read dates. I read the entire top, similar to Q, and had a scout around the Internet, but I cannot find this specific behavior when the origin changes if there is non-data in the column.

I have a tiny Excel spreadsheet that you can get from dropbox:

https://www.dropbox.com/s/872q9mzb5uzukws/test.xlsx

It has three rows, two columns. The first is the date, the second is the number. The third row contains "Grand Total" in the date column.

If I read in the first two lines with read.xlsx and say that the first column is the date, then this works:

 read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer"),endRow=2) X1 X2 1 2014-06-29 49 2 2014-06-30 46 

These are indeed dates in a spreadsheet. If I try to read all three lines, something will go wrong:

 read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer")) X1 X2 1 2084-06-30 49 2 2084-07-01 46 3 <NA> 89251 Warning message: In as.POSIXlt.Date(x) : NAs introduced by coercion 

If I try to read as integers, I get different integers:

 > read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer"),endRow=2) X1 X2 1 16250 49 2 16251 46 > read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer")) X1 X2 1 41819 49 2 41820 46 3 NA 89251 

The first integers are converted correctly using as.Date(s1$X1,origin="1970-01-01") (Unix era), and the integers are converted correctly using as.Date(s2$X1, origin="1899-12-30") (Excel epoch). If I convert the second batch using 1970, I get the dates of 2084.

So: Am I doing something wrong? It is best to read as integers, and if any NS, and then convert using the Excel era, otherwise use the Unix era? Or is this an error in the xlsx package?

Xlsx Version - Version: 0.5.1

+8
date r excel r-xlsx
source share
3 answers

XLConnect is capable of handling this rather sweet:

 test <- readWorksheetFromFile( "~/Downloads/test.xlsx", sheet = "Sheet1", header = FALSE ) test Col1 Col2 1 2014-06-29 00:00:00 49 2 2014-06-30 00:00:00 46 3 Grand Total 89251 

You have an obvious problem: the first column is of a mixed type: character and POSIXct . XLConnect is able to correctly read each cell, but all the cells in the column are sent to the most common type, which in this case is character .

 str(test) 'data.frame': 3 obs. of 2 variables: $ Col1: chr "2014-06-29 00:00:00" "2014-06-30 00:00:00" "Grand Total" $ Col2: num 49 46 89251 
+4
source share

Dates can be read as integers and then converted to Date using the convertToDate () function.

More here

0
source share

The problem you are facing is that Excel stores the number of days from January to 0-1900, and it is this number R that reads from the excel file. When you convert to R, you convert depending on the number of days from January to 1-1970. If you subtract the number of days between the two, it should work.

0
source share

All Articles