Reading csv in python pandas and handling bad values

I am using pandas to read a csv file. Data is numbers, but stored in a CSV file as text. Some of the values ​​are non-numeric when they are bad or missing. How to filter these values ​​and convert the remaining data to integers.

I assume there is a better / faster way than isdigit() through all the values ​​and using isdigit() to check that they are numeric.

Does pandas or numpy have a way to recognize bad values ​​in the reader? If not, what is the easiest way to do this? Do I have to specify dtypes types to make this work?

+1
source share
3 answers

pandas.read_csv has the na_values parameter:

 na_values : list-like, default None List of additional strings to recognize as NA/NaN 

where you can identify these bad values.

+3
source

You can pass a custom list of values ​​to be considered missing using pandas.read_csv . Alternatively, you can pass functions to the converters argument.

+3
source

NumPy provides the genfromtxt() function specifically for this purpose. First sentence from related documentation:

Loading data from a text file with missing values ​​specified as indicated.

+1
source

All Articles