Most pythonic way to read csv values ​​in dict lists

I have a csv file with headers at the top of the data columns as ...

<Header1>, <Header2>, ... ,<HeaderN> <data11> , <data12> , ... ,<data1N> <data21> , <data12> , ... ,<data2N> ... , ... , ... , ... <dataM1> , <dataM2> , ... ,<dataMN> 

(i.e. standard tabular data)

When reading this with DictReader I use a nested loop to add items to a line read into a list in the appropriate key, like

 f = <path_to_some_csv_file.csv> dr = csv.DictReader(open(f)) dict_of_lists = dr.next() for k in dict_of_lists.keys(): dict_of_lists[k] = [dict_of_lists[k]] for line in dr: for k in dict_of_lists.keys(): dict_of_lists[k].append(line[k]) 

The first loop sets all the values ​​in the dict to an empty list. The next loop iterates over each line read from the csv file, from which DictReader creates a key for keys. The inner loop adds the value to the list corresponding to the corresponding key value, so I end the list of dicts I need. In the end, I have to write this quite often.

My question is, is there a more Pythonic way of doing this using built-in functions without a nested loop, or a better idiom, or an alternative way of storing this data structure so that I can return the index list by querying with a key value? If so, is there also a way to format the data that falls into the front column? (for MWE just copy the above data into a text file and run it through the code) Thanks in advance!

+6
source share
3 answers

Depending on what type of data you are storing, and if you're fine using numpy, a good way to do this might be numpy.genfromtxt :

 import numpy as np data = np.genfromtxt('data.csv', delimiter=',', names=True) 

What this will do is create a numpy Structured Array that provides a good interface for querying data by header name (be sure to use names=True if you have a header line).

An example given by data.csv containing:

 a,b,c 1,2,3 4,5,6 7,8,9 

Then you can access the elements with:

 >>> data['a'] # Column with header 'a' array([ 1., 4., 7.]) >>> data[0] # First row (1.0, 2.0, 3.0) >>> data['c'][2] # Specific element 9.0 >>> data[['a', 'c']] # Two columns array([(1.0, 3.0), (4.0, 6.0), (7.0, 9.0)], dtype=[('a', '<f8'), ('c', '<f8')]) 

genfromtext also provides a way, at your request, to "format the data falling into the front column".

converters : variable, optional

A set of functions that convert column data to a value. Converters can also be used to provide a default value for missing data: converters = {3: lambda s: float (s or 0)}.

+5
source

If you want to use a third-party library, the merge_with function from Toolz does all this operation with one insert:

 dict_of_lists = merge_with(list, *csv.DictReader(open(f))) 

Using only stdlib, a defaultdict makes the code less repetitive:

 from collections import defaultdict import csv f = 'test.csv' dict_of_lists = defaultdict(list) for record in DictReader(open(f)): for key, val in record.items(): # or iteritems in Python 2 dict_of_lists[key].append(val) 

If you need to do this often, include it in a function, for example. transpose_csv .

+1
source

You can use a dict and set understanding to make your intention more obvious:

 dr=csv.DictReader(f) data={k:[v] for k, v in dr.next().items()} # create the initial dict of lists for line_dict in dr: {data[k].append(v) for k, v in line_dict.items()} # append to each 

You can use the Alex Martelli method to flatten the list of lists in Python to smooth the iterator iterators, which further reduces the first view:

 dr=csv.DictReader(f) data={k:[v] for k, v in dr.next().items()} {data[k].append(v) for line_dict in dr for k, v in line_dict.items()} 

In Python 2.X, consider using {}. iteritems vs {}. items () if your csv file is significant.


Further example:

Suppose this csv file:

 Header 1,Header 2,Header 3 1,2,3 4,5,6 7,8,9 

Now suppose you need a list of lists of each value converted to float or int. You can do:

 def convert(s, converter): try: return converter(s) except Exception: return s dr=csv.DictReader(f) data={k:[convert(v, float)] for k, v in dr.next().items()} {data[k].append(convert(v, float)) for line_dict in dr for k, v in line_dict.items()} print data # {'Header 3': [3.0, 6.0, 9.0], 'Header 2': [2.0, 5.0, 8.0], 'Header 1': [1.0, 4.0, 7.0]} 
-1
source

All Articles