Is there a way to skip non-convertible lines when starting the pandas series from str for a float?

Question

Is there a way to skip non-convertible lines when starting the pandas series from str for a float?

I have a datagframe pandas created from a csv file. One column of this data frame contains numerical data that is initially passed as a string. Most entries are numeric, but some contain different error codes that are not numeric. I do not know in advance what all the error codes will be or how many there are. So, for example, a data framework might look like this:

[In 1]: df [Out 1]: data OtherAttr MyIndex 0 1.4 aaa 1 error1 foo 2 2.2 bar 3 0.8 bar 4 xxx bbb ... 743733 BadData ccc 743734 7.1 foo

I want to use df.data as a float and throw out any values that are not converted properly. Is there any built-in functionality for this? Something like:

 df.data = df.data.astype(float, skipbad = True)

(Although I know what specifically won't work, and I don't see any kwargs inside the asteri that do what I want)

I think I could write a function using try and then use pandas apply or map , but this seems like an inelegant solution. This should be a fairly common problem, right?

+7

python pandas

user2543645 Aug 21 '13 at 21:41

source share

1 answer

Andy hayden · Accepted Answer · 2013-08-21T21:47:57+0000

Use the convert_objects method, which "tries to output the best dtype for the columns of the object":

 In [11]: df['data'].convert_objects(convert_numeric=True) Out[11]: 0 1.4 1 NaN 2 2.2 3 0.8 4 NaN Name: data, dtype: float64

In fact, you can apply this to the entire DataFrame:

 In [12]: df.convert_objects(convert_numeric=True) Out[12]: data OtherAttr MyIndex 0 1.4 aaa 1 NaN foo 2 2.2 bar 3 0.8 bar 4 NaN bbb

Is there a way to skip non-convertible lines when starting the pandas series from str for a float?

More articles: