Pandas read_csv dtype specify all but one column

I have a CSV file. Most of them are values ​​that I want to read as a string, but I want to read a column as bool if a column with the given name exists.

Since there are many columns in the CSV file, I don’t want to specify the data type in each column directly and give something like this:

data = read_csv('sample.csv', dtype={'A': str, 'B': str, ..., 'X': bool}) 

Is it possible to determine the row type for each column except one, and at the same time read the optional column as bool?

My current solution is the following (but it is very inefficient and slow):

 data = read_csv('sample.csv', dtype=str) # reads all column as string if 'X' in data.columns: l = lambda row: True if row['X'] == 'True' else False if row['X'] == 'False' else None data['X'] = data.apply(l, axis=1) 

UPDATE: CSV Example:

 A;B;C;X a1;b1;c1;True a2;b2;c2;False a3;b3;c3;True 

Or the same could be without the β€œX” column (since the column is optional):

 A;B;C a1;b1;c1 a2;b2;c2 a3;b3;c3 
+5
source share
3 answers

You can first filter the contains columns of the X value with boolean indexing , and then replace :

 cols = df.columns[df.columns.str.contains('X')] df[cols] = df[cols].replace({'True': True, 'False': False}) 

Or, if the filter column needs X :

 cols = df.columns[df.columns == 'X'] df[cols] = df[cols].replace({'True': True, 'False': False}) 

Example:

 import pandas as pd df = pd.DataFrame({'A':['a1','a2','a3'], 'B':['b1','b2','b3'], 'C':['c1','c2','c3'], 'X':['True','False','True']}) print (df) ABCX 0 a1 b1 c1 True 1 a2 b2 c2 False 2 a3 b3 c3 True 
 print (df.dtypes) A object B object C object X object dtype: object cols = df.columns[df.columns.str.contains('X')] print (cols) Index(['X'], dtype='object') df[cols] = df[cols].replace({'True': True, 'False': False}) print (df.dtypes) A object B object C object X bool dtype: object print (df) ABCX 0 a1 b1 c1 True 1 a2 b2 c2 False 2 a3 b3 c3 True 
+2
source

why not use the bool() data type. bool() evaluates to true if the parameter is passed and the parameter is not False, None, '' or 0

 if 'X' in data.columns: try: l = bool(data.columns['X'].replace('False', 0)) except: l = None data['X'] = data.apply(l, axis=1) 
+1
source

In fact, you do not need special handling when using read_csv from pandas (verified in version 0.17). Using your example file with X:

 import pandas as pd df = pd.read_csv("file.csv", delimiter=";") print(df.dtypes) A object B object C object X bool dtype: object 
+1
source

All Articles