Index a Python Pandas framework with multiple SQL conditions, such as where the statement

I have experience in R and new to Python Pandas. I am trying to index a DataFrame to retrieve rows that satisfy a variety of multiple logical conditions, which is very similar to the SQL where clause.

I know how to do this in R using dataframes (and with the R data.table package, which looks more like a Pandas DataFrame than an R-shaped frame).

Here is an example of code that creates a DataFrame and a description of how I would like to index it. Is there an easy way to do this?

import pandas as pd import numpy as np # generate some data mult = 10000 fruits = ['Apple', 'Banana', 'Kiwi', 'Grape', 'Orange', 'Strawberry']*mult vegetables = ['Asparagus', 'Broccoli', 'Carrot', 'Lettuce', 'Rutabaga', 'Spinach']*mult animals = ['Dog', 'Cat', 'Bird', 'Fish', 'Lion', 'Mouse']*mult xValues = np.random.normal(loc=80, scale=2, size=6*mult) yValues = np.random.normal(loc=79, scale=2, size=6*mult) data = {'Fruit': fruits, 'Vegetable': vegetables, 'Animal': animals, 'xValue': xValues, 'yValue': yValues,} df = pd.DataFrame(data) # shuffle the columns to break structure of repeating fruits, vegetables, animals np.random.shuffle(df.Fruit) np.random.shuffle(df.Vegetable) np.random.shuffle(df.Animal) df.head(30) # filter sets fruitsInclude = ['Apple', 'Banana', 'Grape'] vegetablesExclude = ['Asparagus', 'Broccoli'] # subset1: All rows and columns where: # (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude) # subset2: All rows and columns where: # (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')] # subset3: All rows and specific columns where above logical conditions are true. 

All help and input are appreciated and greatly appreciated!

Thanks Randall

+8
python sql pandas indexing
source share
1 answer
 # subset1: All rows and columns where: # (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude) df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude)] # subset2: All rows and columns where: # (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')] df.ix[df['Fruit'].isin(fruitsInclude) & (~df['Vegetable'].isin(vegetablesExclude) | (df['Animal']=='Dog'))] # subset3: All rows and specific columns where above logical conditions are true. df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude) & (df['Animal']=='Dog')] 
+14
source share

All Articles