Numpy read csv file where some fields have comma?

I am trying to read a CSV file using numpy.recfromcsv(...) where some fields have commas. Fields with commas in them are surrounded by quotation marks, i.e. "value1, value2" . Numpy sees the specified field as two different fields, and it does not work very well. The command I'm using right now is

  data = numpy.recfromcsv(dataFilename, delimiter=',', autstrip=True) 

I found this question

Read CSV file with comma in fields in Python

But he does not use numpy , which I really like to use. Therefore, I hope that there is at least one of several options:

  • What are the options for numpy.recfromcsv(...) that will allow me to read the quote field as a single field instead of multiple fields separated by commas?
  • Should I format my CSV file differently?
  • (alternative but not perfect). Read the CSV, as in the cited question, with additional steps to create a numpy array.

Please inform.

+4
source share
3 answers

It turns out the easiest way to do this is to use the standard library module, csv to read the file into a tuple, and then use the tuple as input to the numpy array. I'm sorry that I cannot just read it using numpy, but this does not work.

0
source

This can be done using pandas :

 np_array = pandas.io.parsers.read_csv("file_with_comma_fields_quoted.csv").as_matrix() 
+2
source

If you are considering using the built-in Python csv reader, with a Python document here :

Python csv reader defines some optional Dialect.quotechar options, the default is '"' . In the csv format standard, katchar is another field delimiter, and the delimiter (comma in your case) can be included in the specified field. Citation rules for csv format symbols are clear in the first section of this page .

So, it seems that with the default quotation mark with " built-in Python csv reader manages your problem in default mode.

If you want to stick with Python, why not clean your csv file first, using regexp to identify the fields specified in quotation marks, and for example, change the comma delimiter to \t . But here you are actually parsing the csv format.

+1
source

All Articles