I am trying to read a csv file in R, but it continues to break. I think this may be due to the encoding of the file, but I'm not sure.
Here is the code I ran:
read.csv('crunchbase_companies_2.csv', fileEncoding="UTF-8", quote="")
Then I get a warning message In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,: invalid input found on input connection.
R reads the data, but only until it hits the special character and then stops. So I just finished with partial data in R. I inserted the data I got here: http://pastebin.com/EQLnXz2W . Please note that he disconnects when he types things like "Ì". Therefore, these characters are not in the sample data.
I also checked the encoding in the terminal with file. He returns Non-ISO extended-ASCII English text, with CR line terminators.
What do I need to do to read the entire data set?
source
share