Read csv file in R with currency column as numeric

I am trying to read an R csv file containing information about political contributions. From what I understand, columns are imported by default as factors, but I need the quantity column ('CTRIB_AMT' in the dataset) to be imported as a numeric column, so I can perform various functions that factors will not work. The column is formatted as a currency with the prefix "$".

I used a simple read command to initially import the file:

contribs <- read.csv('path/to/file') 

And then I tried to convert CTRIB_AMT from currency to numeric:

 as.numeric(as.character(sub("$","",contribs$CTRIB_AMT, fixed=TRUE))) 

But that did not work. The functions I'm trying to use for CTRIB_AMT columns are as follows:

 vals<-sort(unique(dfr$CTRIB_AMT)) sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum) counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length) 

See related question here .

Any thoughts on how to import the file initially so that the column is numeric or how to convert it after import?

+8
r
source share
4 answers

I'm not sure how to read it directly, but you can change it when it:

 > A <- read.csv("~/Desktop/data.csv") > A id desc price 1 0 apple $1.00 2 1 banana $2.25 3 2 grapes $1.97 > A$price <- as.numeric(sub("\\$","", A$price)) > A id desc price 1 0 apple 1.00 2 1 banana 2.25 3 2 grapes 1.97 > str(A) 'data.frame': 3 obs. of 3 variables: $ id : int 0 1 2 $ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3 $ price: num 1 2.25 1.97 

I think it might just be a missing escape in your sub. $ indicates the end of the line in regular expressions. \ $ is the dollar sign. But then you need to avoid the escape ...

+14
source share

Another way could be to set the conversion using setAs .
It was used in two (similar) issues:

  • Processing a negative number in an "accounting" formatR
  • How to read a csv file that contains some comma numbers?

For your needs:

 setClass("Currency") setAs("character", "Currency", function(from) as.numeric(sub("$","",from, fixed=TRUE))) contribs <- read.csv("path/to/file", colClasses=c(CTRIB_AMT="Currency")) 
+7
source share

Another solution to the problem, solved long ago:

 convertCurrency <- function(currency) { currency1 <- sub('$','',as.character(currency),fixed=TRUE) currency2 <- as.numeric(gsub('\\,','',as.character(currency1))) currency2 } contribs$CTRIB_AMT_NUM <- convertCurrency(contribs$CTRIB_AMT) 
+4
source share

Or use something like as.numeric(substr(as.character(contribs$CTRIB_AMT),2,20)) , we know that there should not be more than 20 characters.

Another thing to note is that you can remove the need to convert from the factor alltogether if you set stringsAsFactors=F in your call to read.csv()

+2
source share

All Articles