Fread from data.table package when column names include spaces and special characters?

I have a csv file in which column names include spaces and special characters.

fread imports them with quotes - but how can I change this behavior? One reason is because I have column names starting with a space, and I don't know how to handle them.

Any pointers would be helpful.

Edit: example.

 > packageVersion("data.table") [1] '1.8.8' p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE) > head(p2p[,list(Principal remaining)]) Error: unexpected symbol in "head(p2p[,list(Principal remaining" > head(p2p[,list("Principal remaining")]) V1 1: Principal remaining > head(p2p[,list(c("Principal remaining"))]) V1 1: Principal remaining 

What I expected / want, of course, is what gives the column name without spaces:

 > head(p2p[,list(Principal)]) Principal 1: 1000 2: 1000 3: 1000 4: 2000 5: 1000 6: 4130 
+7
source share
3 answers

It's hard enough to get the leading space in the column name. "Random coding" shall not occur. On the other hand, I don’t see a very large error checking in the fread code, therefore, perhaps until this unwanted behavior is fixed (or the function request is rejected), you can do something like this:

 setnames(DT, make.names(colnames(DT))) 

If, on the other hand, it bothers you that colnames(DT) will display the column names with quotation marks, then simply "survive." The way the interactive console displays any character value.

If you have a data item in the character column that looks like " ttt" in the original, then it will have leading spaces when importing, and you need to handle it with colnames(dfrm) <- sub("^\\s+", "", colnames(dfrm)) or one of several trim functions in different packages (for example, "gdata")

+5
source

The version of BondedDust is slightly changed, since the setnames function is not used with <- sign:

 setnames(DT, make.names(colnames(DT)) 
+9
source

You can use the argument check.names = T in the fread function for data.table

 p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE, check.names=T) 

It uses the make.names function in the background

 default is FALSE. If TRUE then the names of the variables in the data.table are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names) so that they are, and also to ensure that there are no duplicates. 
0
source

All Articles