Set headers using pandas.read_csv

Question

Set headers using pandas.read_csv

I have a csv file that I read in a dataframe using the pandas API. I intend to set my own title instead of the first line by default. (I also get rid of some lines). How can I achieve this?

I tried the following, but this did not work as expected:

header_row=['col1','col2','col3','col4', 'col1', 'col2'] # note the header has duplicate column values df = pandas.read_csv(csv_file, skiprows=[0,1,2,3,4,5], names=header_row)

This gives the following error:

 File "third_party/py/pandas/io/parsers.py", line 187, in read_csv File "third_party/py/pandas/io/parsers.py", line 160, in _read File "third_party/py/pandas/io/parsers.py", line 628, in get_chunk File "third_party/py/pandas/core/frame.py", line 302, in __init__ File "third_party/py/pandas/core/frame.py", line 388, in _init_dict File "third_party/py/pandas/core/internals.py", line 1008, in form_blocks File "third_party/py/pandas/core/internals.py", line 1036, in _simple_blockify File "third_party/py/pandas/core/internals.py", line 1068, in _stack_dict IndexError: index out of bounds

Then I tried setting the columns through

 df.columns = header_row

But this error appeared, probably due to duplicate column values.

 File "engines.pyx", line 101, in pandas._engines.DictIndexEngine.get_loc (third_party/py/pandas/src/engines.c:2498) File "engines.pyx", line 107, in pandas._engines.DictIndexEngine.get_loc (third_party/py/pandas/src/engines.c:2447) Exception: ('Index values are not unique', 'occurred at index entity')

I am using pandas version 0.7.3. From the documentation -

names: array-like List of column names

I am sure I am missing something simple here. Thanks for any help here.

+7

python pandas

Manju Aug 22 '12 at 5:01

source share

1 answer

Wouter overmeire · Accepted Answer · 2012-08-22T07:27:59+0000

Pandas 0.7.3 does not support duplicate indexes. You need at least 0.8.0, between 0.8 and 0.8.1, several problems with duplicates in the index are fixed, so 0.8.1 is best (= most recent stable release). However, even 0.8.1 is not the answer to your problem, because this version has issue with duplicate column names (you cannot display a dataframe with duplicate column names).

Set headers using pandas.read_csv

More articles: