A Pandas DataFrame query with a column name that contains a space or using the drop method with a column name that contains a space

Question

A Pandas DataFrame query with a column name that contains a space or using the drop method with a column name that contains a space

I want to use pandas to delete rows based on column name (contains space) and cell value. I tried various ways to achieve this (drop and query methods), but it seems that I am failing due to the space in the name. Is there a way to request data using a name that has a space in it, or do I need to clear all the spaces first?

data as csv file

 Date,"price","Sale Item" 2012-06-11,1600.20,item1 2012-06-12,1610.02,item2 2012-06-13,1618.07,item3 2012-06-14,1624.40,item4 2012-06-15,1626.15,item5 2012-06-16,1626.15,item6 2012-06-17,1626.15,item7

Examples of Attempts

 df.drop(['Sale Item'] != 'Item1') df.drop('Sale Item' != 'Item1') df.drop("'Sale Item'] != 'Item1'") df.query('Sale Item' != 'Item1') df.query(['Sale Item'] != 'Item1') df.query("'Sale Item'] != 'Item1'")

The error received in most cases

 ImportError: 'numexpr' not found. Cannot use engine='numexpr' for query/eval if 'numexpr' is not installed

+6

python pandas

iNoob Oct 05 '15 at 15:43

source share

2 answers

As you can see from the documentation -

DataFrame.drop (labels, axis = 0, level = no, inplace = False, errors = 'raise')
Returns a new object with labels on the requested axis.

DataFrame.drop() takes an index for the rows, not a condition. Therefore, you will most likely need something like:

 df.drop(df.ix[df['Sale Item'] != 'item1'].index)

Note that this will delete lines that satisfy the condition, so the result will be lines that do not meet this condition, if you want to use the opposite, you can use the ~ operator before its condition is canceled.

But this seems too big, it would be easier to just use logical indexing to get the required rows (as indicated in another answer).

Demo -

 In [20]: df Out[20]: Date price Sale Item 0 2012-06-11 1600.20 item1 1 2012-06-12 1610.02 item2 2 2012-06-13 1618.07 item3 3 2012-06-14 1624.40 item4 4 2012-06-15 1626.15 item5 5 2012-06-16 1626.15 item6 6 2012-06-17 1626.15 item7 In [21]: df.drop(df.ix[df['Sale Item'] != 'item1'].index) Out[21]: Date price Sale Item 0 2012-06-11 1600.2 item1

+4

Anand s kumar Oct 05 '15 at 16:09

source share

Fabio lamana · Accepted Answer · 2015-10-05T15:52:37+0000

If I understand your problem correctly, maybe you can just apply a filter, for example:

 df = df[df['Sale Item'] != 'item1']

which returns:

  Date price Sale Item 1 2012-06-12 1610.02 item2 2 2012-06-13 1618.07 item3 3 2012-06-14 1624.40 item4 4 2012-06-15 1626.15 item5 5 2012-06-16 1626.15 item6 6 2012-06-17 1626.15 item7

A Pandas DataFrame query with a column name that contains a space or using the drop method with a column name that contains a space

More articles: