Pandas read_csv, not knowing if a header is present

I have an input file with known columns, say two columns Name and Sex . Sometimes it has a title bar Name,Sex , and sometimes not:

1.csv

 Name,Sex John,M Leslie,F 

2.csv

 John,M Leslie,F 

Knowing the identity of the columns in advance, is there a good way to handle both cases with the same read_csv command? Basically, I want to specify names=['Name', 'Sex'] and then specify header=0 only when there is a header. The best I can come up with is:

  • 1) Read the first line of the file before making read_csv , and set the parameters accordingly.

  • 2) Just do df = pd.read_csv(input_file, names=['Name', 'Sex']) , then check if the null line matches the header and if so leave it (and then you may have to renumber the lines) .

But for me this does not seem unusual. Is there a built-in way to do this with read_csv that I did not think about?

+5
source share
1 answer

using a new function - selection by the called :

 cols = ['Name','Sex'] df = (pd.read_csv(filename, header=None, names=cols) [lambda x: np.ones(len(x)).astype(bool) if (x.iloc[0] != cols).all() else np.concatenate([[False], np.ones(len(x)-1).astype(bool)])] ) 

using . query () method:

 df = (pd.read_csv(filename, header=None, names=cols) .query('Name != "Name" and Sex != "Sex"')) 

I'm not sure if this is the most elegant way, but this should work too:

 df = pd.read_csv(filename, header=None, names=cols) if (df.iloc[0] == cols).all(): df = df[1:].reset_index(drop=True) 
+5
source

All Articles