R 2.13.1 on Mac OS X. I'm trying to import a data file with a dot for the thousands separator and a comma as a decimal point, and also minus minus for negative values.
Basically, I am trying to convert from:
"A|324,80|1.324,80|35,80-"
to
V1 V2 V3 V4 1 A 324.80 1324.8 -35.80
Now in interactive mode the following works are performed:
gsub("\\.","","1.324,80") [1] "1324,80" gsub("(.+)-$","-\\1", "35,80-") [1] "-35,80"
as well as their combination:
gsub("\\.", "", gsub("(.+)-$","-\\1","1.324,80-")) [1] "-1324,80"
However, I cannot remove the thousands separator from read.data:
setClass("num.with.commas") setAs("character", "num.with.commas", function(from) as.numeric(gsub("\\.", "", sub("(.+)-$","-\\1",from))) ) mydata <- "A|324,80|1.324,80|35,80-" mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas")) Warning messages: 1: In asMethod(object) : NAs introduced by coercion 2: In asMethod(object) : NAs introduced by coercion 3: In asMethod(object) : NAs introduced by coercion mytable V1 V2 V3 V4 1 A NA NA NA
Please note that if I switch from "\\." on "," in the function, everything looks a little different:
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",", "", sub("(.+)-$","-\\1",from))) ) mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas")) mytable V1 V2 V3 V4 1 A 32480 1.3248 -3580
I think the problem is that read.data with dec = "," converts the inbox, "to". " BEFORE calling (from, "num.with.commas"), so the input string could be, for example, "1.324.80".
I want both ((1.123.80 - "," num.with.commas ") to return -1123.80 and how (" 1.100.123,80 "," num.with.commas ") to return 1100123.80.
How can I get my num.with.commas to replace everything except the last decimal point in the input string?
Update . First, I added a negative lookahead and got () work in the console:
setAs("character", "num.with.commas", function(from) as.numeric(gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE)) ) as("1.210.123.80-","num.with.commas") [1] -1210124 as("10.123.80-","num.with.commas") [1] -10123.8 as("10.123.80","num.with.commas") [1] 10123.8
However, read.table still had the same problem. Adding some print () s function to my function showed that num.with.commas actually got a comma, not a period.
So my current solution is to replace with "," by "." at num.with.commas.
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",","\\.",gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE))) ) mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas")) mytable V1 V2 V3 V4 1 A 324.8 1101325 -35.8