Reading csv in python pandas and handling bad values

Question

Reading csv in python pandas and handling bad values

I am using pandas to read a csv file. Data is numbers, but stored in a CSV file as text. Some of the values are non-numeric when they are bad or missing. How to filter these values and convert the remaining data to integers.

I assume there is a better / faster way than isdigit() through all the values and using isdigit() to check that they are numeric.

Does pandas or numpy have a way to recognize bad values in the reader? If not, what is the easiest way to do this? Do I have to specify dtypes types to make this work?

+1

python numpy pandas

Dave31415 Mar 29 '12 at 14:42

source share

3 answers

You can pass a custom list of values to be considered missing using pandas.read_csv . Alternatively, you can pass functions to the converters argument.

+3

Wes mckinney Mar 29 '12 at 14:55

source share

NumPy provides the genfromtxt() function specifically for this purpose. First sentence from related documentation:

Loading data from a text file with missing values specified as indicated.

+1

Sven marnach Mar 29 '12 at 14:45

source share

eumiro · Accepted Answer · 2012-03-30T10:54:02+0000

pandas.read_csv has the na_values parameter:

 na_values : list-like, default None List of additional strings to recognize as NA/NaN

where you can identify these bad values.

Reading csv in python pandas and handling bad values

More articles: