Pandas read_csv dtype specify all but one column

Question

Pandas read_csv dtype specify all but one column

I have a CSV file. Most of them are values that I want to read as a string, but I want to read a column as bool if a column with the given name exists.

Since there are many columns in the CSV file, I don’t want to specify the data type in each column directly and give something like this:

data = read_csv('sample.csv', dtype={'A': str, 'B': str, ..., 'X': bool})

Is it possible to determine the row type for each column except one, and at the same time read the optional column as bool?

My current solution is the following (but it is very inefficient and slow):

 data = read_csv('sample.csv', dtype=str) # reads all column as string if 'X' in data.columns: l = lambda row: True if row['X'] == 'True' else False if row['X'] == 'False' else None data['X'] = data.apply(l, axis=1)

UPDATE: CSV Example:

 A;B;C;X a1;b1;c1;True a2;b2;c2;False a3;b3;c3;True

Or the same could be without the “X” column (since the column is optional):

 A;B;C a1;b1;c1 a2;b2;c2 a3;b3;c3

+5

python pandas csv dataframe

user1802693 May 29 '16 at 23:27

source share

3 answers

why not use the bool() data type. bool() evaluates to true if the parameter is passed and the parameter is not False, None, '' or 0

 if 'X' in data.columns: try: l = bool(data.columns['X'].replace('False', 0)) except: l = None data['X'] = data.apply(l, axis=1)

+1

TheLazyScripter May 29 '16 at 23:38

source share

In fact, you do not need special handling when using read_csv from pandas (verified in version 0.17). Using your example file with X:

 import pandas as pd df = pd.read_csv("file.csv", delimiter=";") print(df.dtypes) A object B object C object X bool dtype: object

+1

gbakie May 29 '16 at 23:56

source share

jezrael · Accepted Answer · 2016-05-29T23:34:07+0000

You can first filter the contains columns of the X value with boolean indexing , and then replace :

 cols = df.columns[df.columns.str.contains('X')] df[cols] = df[cols].replace({'True': True, 'False': False})

Or, if the filter column needs X :

 cols = df.columns[df.columns == 'X'] df[cols] = df[cols].replace({'True': True, 'False': False})

Example:

 import pandas as pd df = pd.DataFrame({'A':['a1','a2','a3'], 'B':['b1','b2','b3'], 'C':['c1','c2','c3'], 'X':['True','False','True']}) print (df) ABCX 0 a1 b1 c1 True 1 a2 b2 c2 False 2 a3 b3 c3 True

 print (df.dtypes) A object B object C object X object dtype: object cols = df.columns[df.columns.str.contains('X')] print (cols) Index(['X'], dtype='object') df[cols] = df[cols].replace({'True': True, 'False': False}) print (df.dtypes) A object B object C object X bool dtype: object print (df) ABCX 0 a1 b1 c1 True 1 a2 b2 c2 False 2 a3 b3 c3 True

Pandas read_csv dtype specify all but one column

More articles: