Is there a way to skip non-convertible lines when starting the pandas series from str for a float?

I have a datagframe pandas created from a csv file. One column of this data frame contains numerical data that is initially passed as a string. Most entries are numeric, but some contain different error codes that are not numeric. I do not know in advance what all the error codes will be or how many there are. So, for example, a data framework might look like this:

[In 1]: df [Out 1]: data OtherAttr MyIndex 0 1.4 aaa 1 error1 foo 2 2.2 bar 3 0.8 bar 4 xxx bbb ... 743733 BadData ccc 743734 7.1 foo 

I want to use df.data as a float and throw out any values ​​that are not converted properly. Is there any built-in functionality for this? Something like:

 df.data = df.data.astype(float, skipbad = True) 

(Although I know what specifically won't work, and I don't see any kwargs inside the asteri that do what I want)

I think I could write a function using try and then use pandas apply or map , but this seems like an inelegant solution. This should be a fairly common problem, right?

+7
python pandas
source share
1 answer

Use the convert_objects method, which "tries to output the best dtype for the columns of the object":

 In [11]: df['data'].convert_objects(convert_numeric=True) Out[11]: 0 1.4 1 NaN 2 2.2 3 0.8 4 NaN Name: data, dtype: float64 

In fact, you can apply this to the entire DataFrame:

 In [12]: df.convert_objects(convert_numeric=True) Out[12]: data OtherAttr MyIndex 0 1.4 aaa 1 NaN foo 2 2.2 bar 3 0.8 bar 4 NaN bbb 
+3
source share

All Articles