Broken CSV, how can I fix it?

I am trying to parse a CSV. I would like to get it in the database or just parse it using JavaScript, but in any case it crashes due to broken syntax. The entire CSV file is here:
https://gist.github.com/1023560

If you notice, it breaks where double quotes have double quotes, and it also fails when pasting into MySQL. The first gap is shown on line 13. It is interrupted and instead of returning a full one:

<a href="http://www.facebook.com/pages/Portland-Community-Gardens/139244076118027?v=wall" target="_blank"><img src="/shared/cfm/image.cfm?id=348340" alt="Facebook" width="100" height="31" /></a> 

It returns:

 <a href=" 

For JavaScript, I was going to just use CSVToArray () from Ben Nadel:
http://www.bennadel.com/blog/1504-Ask-Ben-Parsing-CSV-Strings-With-Javascript-Exec-Regular-Expression-Command.htm

My ultimate goal is to put in MySQL so that I can respond to the JSON feed using PHP json_encode() .

All I noticed can be problematic is that double quotes can be in HTML tags as above, but also as HTML text tag tags, so "<span class="text">"Example"</span>"

The first set of quotes is a CSV column, the second is HTML quotes, and the third is text quotes.

+4
source share
3 answers

You can trick it and use regex to search:

 "(.*?)"(?=,|$) 

But such a hack-ish (basically, only accept the final quote when the comma or the end of the line immediately follows). The same logic applies to find-replace. (Again, all this assumes that a β€œstray” quote will never follow the standard CSV rules (for example, have a comma / line [start / end] before or after it))

I assume that you do not have control over the source data and you should work with what you have?

EDIT

Although I only tried this on a small sample of your data, it seems to find stray quotes that you can use replace with "" on:

 (?<!^|"|,)"(?!"|,|$) 
+2
source

Quotes do not matter how commas do it. If the comma is a separator, then you cannot have commas in the values. If you can get a CSV saved using a different delimiter, you can get better results.
Use a character like ~ or ^ instead of a comma as a separator.

0
source

Assuming you are either on Windows or you can do it in a Windows window, check out Logparser . This is a free command line utility that can analyze many data formats, including CSV, and can output to many formats, including SQL.

0
source

All Articles