How is a self-reference column in a pandas data frame?

In Python Pandas, I use a Data Frame as such:

drinks = pandas.read_csv(data_url)

Where data_url is the string URL of the CSV file

When indexing the framework for all “easy drinkers,” where light drinkers make up 1 drink, it says:

drinks.light_drinker[drinks.light_drinker == 1]

Is there a more DRY-like way to independently refer to the "parent"? That is, something like:

drinks.light_drinker[self == 1]
+4
source share
3 answers

Now you can use query or assign depending on what you need:

drinks.query('light_drinker == 1')

or for mutation df:

df.assign(strong_drinker = lambda x: x.light_drinker + 100)

Old answer

. where . API :

df.set(new_column=lambda self: self.light_drinker*2)
+4

, , self this Pandas, , , , , DRY, where().

drinks.where(drinks.light_drinker == 1, inplace=True)
+1

pandas, .where() !

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.where.html?highlight=where#pandas.DataFrame.where

, :

drinks.light_drinker.where(lambda x: x == 1)

. , ( DataFrame, light_drinker). , .

DataFrame, :

drinks.where(lambda x: x.light_drinker == 1)

Note that this will save the self form (this means that you will have rows where all the records will be NaN, because the condition failed for the value light_drinkerin this index).

If you do not want to save the DataFrame form (i.e. you want to delete rows NaN), use:

drinks.query('light_drinker == 1')

Note that the elements in DataFrame.indexand DataFrame.columnsby default are placed in the namespace query, which means that you do not need to reference self.

0
source

All Articles