Dec argument to data.table :: fread

I am using fread from data.table to load csv files. However, my csv files use dec="," as a decimal separator ( 1.23 will be 1,23 ). Unlike read.csv it seems that dec not a valid parameter.

 R) args(fread) function (input = "test.csv", sep = "auto", sep2 = "auto", nrows = -1, header = "auto", na.strings = "NA", stringsAsFactors = FALSE, verbose = FALSE, autostart = 30) 

Do you see a job (the R option can be set) that will allow me to use fread (it is much faster than it saves a lot of time)?

PS: colClasses is not implemented yet, so setAs cannot be used like this post

+7
source share
1 answer

Oct 2014 update : now in version 1.0.5

fread now accepts dec=',' (and other non-second delimiters), # 917 . A new paragraph has been added to ?fread . If you are in a country that uses dec=',' then it should work. If not, you will need to read the paragraph for an additional step. If it somehow breaks dec='.' , this new feature can be disabled using options(datatable.fread.dec.experiment=FALSE) .



Previous answer ...

Matt Dole found a good job with locales. My sessionInfo first

 sessionInfo() R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C [5] LC_TIME=C ... 

Try the following: culprit:

 Sys.localeconv()["decimal_point"] decimal_point "." 

Attempting to set LC_NUMERIC worked on Ubuntu (Matthew) and WinXP (me)

 Sys.setlocale("LC_NUMERIC", "French_France.1252") [1] "French_France.1252" Message d'avis : In Sys.setlocale("LC_NUMERIC", "French_France.1252") : changer 'LC_NUMERIC' peut rรฉsulter en un fonctionnement รฉtrange de R 

The behavior is great and changes as:

 DT = fread("A,B\n3,14;123\n4,22;456\n",sep=";") str(DT) Classes 'data.table' and 'data.frame': 2 obs. of 2 variables: $ V1: num 3.14 4.22 $ V2: int 123 456 

"." decimal separators are now loaded as strings (as it should be), before it was the other way around.

 DT = fread("A,B\n3.14;123\n4.22;456\n",sep=";") str(DT) Classes 'data.table' and 'data.frame': 2 obs. of 2 variables: $ V1: chr "3.14" "4.22" $ V2: int 123 456 
+8
source

All Articles