Reading the .xlsx read date is incorrect if not specified in the column

Question

Reading the .xlsx read date is incorrect if not specified in the column

The xlsx package incorrectly indicates read dates. I read the entire top, similar to Q, and had a scout around the Internet, but I cannot find this specific behavior when the origin changes if there is non-data in the column.

I have a tiny Excel spreadsheet that you can get from dropbox:

https://www.dropbox.com/s/872q9mzb5uzukws/test.xlsx

It has three rows, two columns. The first is the date, the second is the number. The third row contains "Grand Total" in the date column.

If I read in the first two lines with read.xlsx and say that the first column is the date, then this works:

 read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer"),endRow=2) X1 X2 1 2014-06-29 49 2 2014-06-30 46

These are indeed dates in a spreadsheet. If I try to read all three lines, something will go wrong:

 read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("Date","integer")) X1 X2 1 2084-06-30 49 2 2084-07-01 46 3 <NA> 89251 Warning message: In as.POSIXlt.Date(x) : NAs introduced by coercion

If I try to read as integers, I get different integers:

 > read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer"),endRow=2) X1 X2 1 16250 49 2 16251 46 > read.xlsx("./test.xlsx",head=FALSE,1,colClasses=c("integer","integer")) X1 X2 1 41819 49 2 41820 46 3 NA 89251

The first integers are converted correctly using as.Date(s1$X1,origin="1970-01-01") (Unix era), and the integers are converted correctly using as.Date(s2$X1, origin="1899-12-30") (Excel epoch). If I convert the second batch using 1970, I get the dates of 2084.

So: Am I doing something wrong? It is best to read as integers, and if any NS, and then convert using the Excel era, otherwise use the Unix era? Or is this an error in the xlsx package?

Xlsx Version - Version: 0.5.1

+8

date r excel r-xlsx

Spacedman Aug 6 '14 at 11:17

source share

3 answers

Beasterfield · Answer 1 · 2014-08-06T11:39:33+0000

XLConnect is capable of handling this rather sweet:

 test <- readWorksheetFromFile( "~/Downloads/test.xlsx", sheet = "Sheet1", header = FALSE ) test Col1 Col2 1 2014-06-29 00:00:00 49 2 2014-06-30 00:00:00 46 3 Grand Total 89251

You have an obvious problem: the first column is of a mixed type: character and POSIXct . XLConnect is able to correctly read each cell, but all the cells in the column are sent to the most common type, which in this case is character .

 str(test) 'data.frame': 3 obs. of 2 variables: $ Col1: chr "2014-06-29 00:00:00" "2014-06-30 00:00:00" "Grand Total" $ Col2: num 49 46 89251

Ali · Answer 2 · 2017-07-19T04:33:26+0000

Dates can be read as integers and then converted to Date using the convertToDate () function.

More here

Tracy · Answer 3 · 2018-02-03T15:35:28+0000

The problem you are facing is that Excel stores the number of days from January to 0-1900, and it is this number R that reads from the excel file. When you convert to R, you convert depending on the number of days from January to 1-1970. If you subtract the number of days between the two, it should work.

Reading the .xlsx read date is incorrect if not specified in the column

More articles: