Pandas - Group and create a new DataFrame?

This is my situation -

In[1]: data Out[1]: Item Type 0 Orange Edible, Fruit 1 Banana Edible, Fruit 2 Tomato Edible, Vegetable 3 Laptop Non Edible, Electronic In[2]: type(data) Out[2]: pandas.core.frame.DataFrame 

I want to create a data frame of only Fruits , so I need groupby for Fruit exist in Type .

I tried to do this:

grouped = data.groupby(lambda x: "Fruit" in x, axis=1)

I do not know how to do this, I understand groupby little bit hard. How to get a new DataFrame only Fruits ?

+6
source share
2 answers

you can use

 data[data['Type'].str.contains('Fruit')] 

 import pandas as pd data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'], 'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']}) print(data[data['Type'].str.contains('Fruit')]) 

gives

  Item Type 0 Orange Edible, Fruit 1 Banana Edible, Fruit 
+6
source

groupby does something else completely. He creates groups for aggregation. It basically comes from something like:

 ['a', 'b', 'a', 'c', 'b', 'b'] 

to something like:

 [['a', 'a'], ['b', 'b', 'b'], ['c']] 

You want df.apply .

pandas versions of pandas have a query method that makes this a bit more efficient and easy.

However, what needs to be done to make a boolean array with

 mask = df.Type.apply(lambda x: 'Fruit' in x) 

Then select the appropriate parts of the data frame with df[mask] . Or, as a single line:

 df[df.Type.apply(lambda x: 'Fruit' in x)] 

As a complete example:

 import pandas as pd data = [['Orange', 'Edible, Fruit'], ['Banana', 'Edible, Fruit'], ['Tomato', 'Edible, Vegtable'], ['Laptop', 'Non Edible, Electronic']] df = pd.DataFrame(data, columns=['Item', 'Type']) print df[df.Type.apply(lambda x: 'Fruit' in x)] 
+5
source

All Articles