Reading csv file in R with double quotes

Question

Reading csv file in R with double quotes

Suppose the csv file looks like this:

Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE A,3,"","I have comma, ha!",I have open double quotes",A,""

The desired output should be:

 df <- data.frame(Type='A',ID=3, NAME=NA, CONTENT='I have comma, ha!', RESPONSE='I have open double quotes\"', GRADE=A, SOURCE=NA) df Type ID NAME CONTENT RESPONSE GRADE SOURCE 1 A 3 NA I have comma, ha! I have open double quotes" A NA

I tried using read.csv since the data provider uses a quote to avoid a comma in a string, but they forgot to avoid double quotes in a comma-free string, so regardless of whether I turned off the quote in read.csv , I won Get the desired result .

How can I do this in R? Other batch solutions are also welcome.

+7

r csv

Bamqf Aug 19 '15 at 19:04

source share

3 answers

This is not a valid CSV, so you have to do your own parsing. But assuming the convention is this, you can simply switch using scan to take advantage of most of your abilities:

If the field begins with a quote, it is quoted.
If the field does not start with a quote, it is raw

 next_field<-function(stream) { p<-seek(stream) d<-readChar(stream,1) seek(stream,p) if(d=="\"") field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE) else field<-scan(stream,"",1,sep=",",quote="",blank=FALSE) return(field) }

Assuming the above convention, this is sufficient for parsing as follows:

 s<-file("example.csv",open="rt") header<-readLines(s,1) header<-scan(what="",text=header,sep=",") line<-replicate(length(header),next_field(s)) setNames(as.data.frame(lapply(line,type.convert)),header)

  Type ID NAME CONTENT RESPONSE GRADE SOURCE
 1 A 3 NA I have comma, ha!  I have open double quotes "A NA

However, in practice, you may need to first write the fields, quoting them, to another file so that you can simply read.csv in the corrected format.

+2

A. Webb Aug 19 '15 at 21:28

source share

I am not too sure about the structure of CSV files, but you said that the author escaped a comma in the text according to the content.

This works to read the text as it is with the " at the end."

 read.csv2("Test.csv", header = T,sep = ",", quote="")

0

Buzz lightyear Aug 19 '15 at 19:28

source share

eddi · Accepted Answer · 2015-08-19T20:10:11+0000

fread from data.table handles this just fine:

 library(data.table) fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE A,3,"","I have comma, ha!",I have open double quotes",A,""') # Type ID NAME CONTENT RESPONSE GRADE SOURCE #1: A 3 I have comma, ha! I have open double quotes" A

Reading csv file in R with double quotes

More articles: