I have a large time series data frame (called df ), and the first 5 records look like this:
df stn years_of_data total_minutes avg_daily TOA_daily K_daily date 1900-01-14 AlberniElementary 4 5745 34.100 114.600 0.298 1900-01-14 AlberniWeather 6 7129 29.500 114.600 0.257 1900-01-14 Arbutus 8 11174 30.500 114.600 0.266 1900-01-14 Arrowview 7 10080 27.600 114.600 0.241 1900-01-14 Bayside 7 9745 33.800 114.600 0.295
Purpose:
I am trying to delete rows where any rows in the list are present in the column 'stn' . So, I'm basically trying to filter out this dataset so as not to include rows containing none of the rows in the following list.
Attempt
remove_list = ['Arbutus','Bayside'] cleaned = df[df['stn'].str.contains('remove_list')]
Return:
From [78]:
stn years_of_data total_minutes avg_daily TOA_daily K_daily date
Nothing!
I tried several combinations of quotes, brackets, and even a lambda function; although I'm pretty newbie, so probably didn't use the syntax correctly.
isin:
cleaned = df[~df['stn'].isin(remove_list)] In [7]: remove_list = ['Arbutus','Bayside'] df[~df['stn'].isin(remove_list)] Out[7]: stn years_of_data total_minutes avg_daily \ date 1900-01-14 AlberniElementary 4 5745 34.1 1900-01-14 AlberniWeather 6 7129 29.5 1900-01-14 Arrowview 7 10080 27.6 TOA_daily K_daily date 1900-01-14 114.6 0.298 1900-01-14 114.6 0.257 1900-01-14 114.6 0.241
, , , . @EdChum , , . , .isin .
.isin
, numpy.where:
removelist = ['ayside','rrowview'] df['flagCol'] = numpy.where(df.stn.str.contains('|'.join(remove_list)),1,0)
, , . //, .
, , , , . numpy.where , , .isin.
numpy.where