Python way to read CSV with row and column headers

Suppose there is a CSV table with row and column headings, for example:

, "Car", "Bike", "Boat", "Plane", "Shuttle" "Red", 1, 7, 3, 0, 0 "Green", 5, 0, 0, 0, 0 "Blue", 1, 1, 4, 0, 1 

I want to get row and column headers, i.e.:

 col_headers = ["Car", "Bike", "Boat", "Plane", "Shuttle"] row_headers = ["Red", "Green", "Blue"] data = [[1, 7, 3, 0, 0], [5, 0, 0, 0, 0], [1, 1, 4, 0, 1]] 

Of course I can do something like

 import csv with open("path/to/file.csv", "r") as f: csvraw = list(csv.reader(f)) col_headers = csvraw[1][1:] row_headers = [row[0] for row in csvraw[1:]] data = [row[1:] for row in csvraw[1:]] 

... but he doesn't look Pythonic enough.

Is there a cleaner way for this natural operation?

+7
source share
5 answers

Take a look at csv.DictReader .

If fieldnames is omitted, the values ​​in the first line of csvfile will be used as field names.

Then you can just do reader.fieldnames . This, of course, only gives column headers. You still have to manually parse the row headers.

I think your original solution is pretty good.

+2
source

Now I see that what I want is the easiest (and most reliable) to execute Pandas .

 import pandas as pd df = pd.read_csv('foo.csv', index_col=0) 

And if I want, it is easy to extract it:

 col_headers = list(df.columns) row_headers = list(df.index) 

Otherwise, in raw Python, it seems that the method I wrote in the question is "good enough."

+2
source

I know that this solution gives you a different output format than requested, but it is very convenient. This reads the csv line in the dictionary:

 reader = csv.reader(open(parameters_file), dialect) keys = [key.lower() for key in reader.next()] for line in reader: parameter = dict(zip(keys, cells)) 
0
source

Without third-party libraries (and if you can live with results that are tuples from zip op):

 import csv with open('your_csv_file') as fin: csvin = csv.reader(fin, skipinitialspace=True) col_header = next(csvin, [])[1:] row_header, data = zip(*((row[0], row[1:]) for row in csvin)) 

Gives you for col_header , row_header and data :

 ['Bike', 'Boat', 'Plane', 'Shuttle'] ('Red', 'Green', 'Blue') (['1', '7', '3', '0', '0'], ['5', '0', '0', '0', '0'], ['1', '1', '4', '0', '1']) 
0
source

Agree, pandas is the best I have found. I am interested in reading certain values ​​of my frame. Here is what I did:

 import pandas as pd d=pd.read_csv(pathToFile+"easyEx.csv") print(d) print(d.index.values) print(d.index.values[2]) print(d.columns.values) print(d.columns.values[2]) print(pd.DataFrame(d,index=['Blue'],columns=['Boat'])+0.333) 

And this is what it returns:

  Car Bike Boat Plane Shuttle Red 1 7 3 0 0 Green 5 0 0 0 0 Blue 1 1 4 0 1 ['Red' 'Green' 'Blue'] Blue ['Car' 'Bike' 'Boat' 'Plane' 'Shuttle'] Boat Boat Blue 4.333 

Please note that I can check the names of the rows with the names "index" and "column". Also note that I can read a specific database item “dataframe” on its row and column names and that the values ​​are still numeric, so I added “+0.333” to the last print.

I ran the data file, I removed the quote characters ("") and the spaces after the commas in the first line. Here you have the easyEx.csv file:

 Car,Bike,Boat,Plane,Shuttle Red, 1, 7, 3, 0, 0 Green, 5, 0, 0, 0, 0 Blue, 1, 1, 4, 0, 1 

Hope this helps =)

0
source

All Articles