Why does R read numeric data as a character?

I am trying to upload a file that contains integer and floating data. I do not understand why R will read one of the columns as a character field.

> df <- read.table( 'C:\\temp\\test.tab' , + sep = '\t' , header = TRUE , stringsAsFactors = FALSE , dec="." ) > str(df) 'data.frame': 7 obs. of 5 variables: $ A: int 0 0 0 0 1 0 0 $ B: int 1431 2097 2712 24821 27359 41165 49221 $ C: int 0 0 0 0 0 0 0 $ D: chr "7" "26.950000762939453" "57.95000076293945" "21" ... $ E: int 1 2 3 4 5 6 7 

File content:

  ABCDE 0 1431 0 7 1 0 2097 0 26.950000762939453 2 0 2712 0 57.95000076293945 3 0 24821 0 21 4 1 27359 0 57.900001525878906 5 0 41165 0 33.95000076293945 6 0 49221 0 28.950000762939453 7 

enter image description here

  > R.version _ platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 1.0 year 2014 month 04 day 10 svn rev 65387 language R version.string R version 3.1.0 (2014-04-10) nickname Spring Dance 
+8
r
source share
1 answer

This probably deserves a genuine answer that we can point to, therefore ....

The behavior of type.convert was changed in R 3.1.0 (and, as will be shown below, will be largely returned to its pre-3.1.0 behavior in R 3.1.1):

As in the case of R 3.1.0, where the conversion of input data to numeric or complex results in a loss of accuracy, which they return as strings (for as.is = TRUE) or factors.

This created quite a lot of noise on the r-devel mailing list. The beginning of the corresponding (and long) stream is here .

As Ben mentioned, one of the results of this discussion is that the default behavior was restored in the version for the subsequent version.

In the short term, if you know which columns will be affected, you can always use colClasses . Otherwise, you will have to modify your code to check the read.table results and convert things yourself, I think.

+11
source share

All Articles