Using sep = "." in `fread` from" data.table "

Can fread from "data.table" force "." what is the meaning of sep ?

I am trying to use fread to speed up concat.split functions in "splitstackshape" . See this Gist for a general approach that I take, and this question for why I want to make a switch.

The problem I am facing is handling the dot ( "." ) As the value for sep . Whenever I do this, I get an “unexpected character” error.

The following simplified example demonstrates the problem.

 library(data.table) y <- paste("192.168.1.", 1:10, sep = "") x1 <- tempfile() writeLines(y, x1) fread(x1, sep = ".", header = FALSE) # Error in fread(x1, sep = ".", header = FALSE) : Unexpected character ( # 192) ending field 2 of line 1 

The workaround that I have in my current function is to replace the "." to another character, which I hope is not present in the source data, say "|" but it seems risky to me because I cannot predict what is in someone else. Here's a workaround in action.

 x2 <- tempfile() z <- gsub(".", "|", y, fixed=TRUE) writeLines(z, x2) fread(x2, sep = "|", header = FALSE) # V1 V2 V3 V4 # 1: 192 168 1 1 # 2: 192 168 1 2 # 3: 192 168 1 3 # 4: 192 168 1 4 # 5: 192 168 1 5 # 6: 192 168 1 6 # 7: 192 168 1 7 # 8: 192 168 1 8 # 9: 192 168 1 9 # 10: 192 168 1 10 

For the purposes of this question, suppose the data is balanced (each line will have the same number of " sep " characters). I know the use of "." as a separator is not a good idea, but I'm just trying to explain what other users may have in their datasets, based on other questions I answered here on SO.

+7
r data.table fread splitstackshape
source share
2 answers

Now implemented in v1.9.5 on GitHub.

 > input = paste( paste("192.168.1.", 1:5, sep=""), collapse="\n") > cat(input,"\n") 192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4 192.168.1.5 

Setting sep='.' leads to ambiguity with the new dec argument (default '.' ):

 > fread(input,sep=".") Error in fread(input, sep = ".") : The two arguments to fread 'dec' and 'sep' are equal ('.') 

Therefore, select for dec :

 > fread(input,sep=".",dec=",") V1 V2 V3 V4 1: 192 168 1 1 2: 192 168 1 2 3: 192 168 1 3 4: 192 168 1 4 5: 192 168 1 5 

You may receive a warning:

 > fread(input,sep=".",dec=",") V1 V2 V3 V4 1: 192 168 1 1 2: 192 168 1 2 3: 192 168 1 3 4: 192 168 1 4 5: 192 168 1 5 Warning message: In fread(input, sep = ".", dec = ",") : Run again with verbose=TRUE to inspect... Unable to change to a locale which provides the desired dec. You will need to add a valid locale name to getOption("datatable.fread.dec.locale"). See the paragraph in ?fread. 

Either ignore or suppress the warning, or read the paragraph and set the parameter:

 options(datatable.fread.dec.locale = "fr_FR.utf8") 

This ensures that there is no ambiguity.

+3
source share

<this is a long comment, not an answer>

The stitches of the problem refer to the numerical value of the text itself.

 library(data.table) y <- paste("Hz.BB.GHG.", 1:10, sep = "") xChar <- tempfile() writeLines(y, xChar) fread(xChar, sep = ".", header = FALSE) # V1 V2 V3 V4 # 1: Hz BB GHG 1 # 2: Hz BB GHG 2 # 3: Hz BB GHG 3 # 4: Hz BB GHG 4 # 5: Hz BB GHG 5 # 6: Hz BB GHG 6 # 7: Hz BB GHG 7 # 8: Hz BB GHG 8 # 9: Hz BB GHG 9 # 10: Hz BB GHG 10 

However, trying with the original value again gives the same error

 fread(x1, sep = ".", header = FALSE, colClasses="numeric", verbose=TRUE) fread(x1, sep = ".", header = FALSE, colClasses="character", verbose=TRUE) Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. Looking for supplied sep '.' on line 10 (the last non blank line in the first 'autostart') ... found ok Found 4 columns First row with 4 fields occurs on line 1 (either column names or first row of data) Error in fread(x1, sep = ".", header = FALSE, colClasses = "character", : Unexpected character (192.) ending field 2 of line 1 

This, however, works:

 read.table(x1, sep=".") # V1 V2 V3 V4 # 1 192 168 1 1 # 2 192 168 1 2 # 3 192 168 1 3 # 4 192 168 1 4 # ... <cropped> 
0
source share

All Articles