Source DF:
In [176]: df Out[176]: 0 1 2 3 Market Cap 0 1.707280 0.666952 0.638515 -0.061126 2.291747 1.71B 1 -1.017134 1.353627 0.618433 0.008279 0.148128 1.82B 2 -0.774057 -0.165566 -0.083345 0.741598 -0.139851 1.1M 3 -0.630724 0.250737 1.308556 -1.040799 1.064456 30.92M 4 2.029370 0.899612 0.261146 1.474148 -1.663970 476.74k 5 2.029370 0.899612 0.261146 1.474148 -1.663970 -1
Decision:
to_replace = ['\d+\s*[Kk]','\d+\s*[Mm]','\d+\s*[Bb]', '-1', 'N/A'] value = [1000,1000000,1000000000, 1, 1] mask = df.assign( f=df['Market Cap'].replace(to_replace, value, regex=True), Marketcap=pd.to_numeric(df['Market Cap'].str.replace(r'[^\d\.]', ''), errors='coerce') ).eval("Marketcap * f < 35000000") df[mask]
Result:
In [178]: df[mask] Out[178]: 0 1 2 3 Market Cap 2 -0.774057 -0.165566 -0.083345 0.741598 -0.139851 1.1M 3 -0.630724 0.250737 1.308556 -1.040799 1.064456 30.92M 4 2.029370 0.899612 0.261146 1.474148 -1.663970 476.74k 5 2.029370 0.899612 0.261146 1.474148 -1.663970 -1
PS, if you want to leave non-numeric values (for example, N/A ) as a result of changing the data set:
pd.to_numeric(df['Market Cap'].str.replace(r'[^\d\.]', ''), errors='coerce')
to
pd.to_numeric(df['Market Cap'].str.replace(r'[^\d\.]', ''), errors='coerce').fillna('0')
Maxu
source share