Pandas warning when using a map: trying to set a value on a copy of a slice from a DataFrame

I have the following code and it works. This basically renames the values ​​in the columns so that they can be combined later.

pop = pd.read_csv('population.csv') pop_recent = pop[pop['Year'] == 2014] mapping = { 'Korea, Rep.': 'South Korea', 'Taiwan, China': 'Taiwan' } f= lambda x: mapping.get(x, x) pop_recent['Country Name'] = pop_recent['Country Name'].map(f) 

Warning: The value is trying to be set on a copy of a slice from a DataFrame. Try using .loc [row_indexer, col_indexer] = value instead. See Disclaimer in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy pop_recent ['Country Name '] = pop_recent [' Country Name ']. map (f)

I made it google! But no examples seem to use the map, so I'm at a loss ...

+6
source share
2 answers

The problem is the indexing chain , what you are actually trying to do is set the values ​​- pop[pop['Year'] == 2014]['Country Name'] - this will not work in most cases (as explained very good in related documentation), since these are two different calls, and one of the calls can return a copy of the data frame (I believe that logical indexing) returns a copy of the data frame).

Therefore, when you try to set values ​​for this copy, it is not reflected in the original data frame. Example -

 In [6]: df Out[6]: AB 0 1 2 1 3 4 2 4 5 3 6 7 4 8 9 In [7]: df[df['A']==1]['B'] = 10 /path/to/ipython-script.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy if __name__ == '__main__': In [8]: df Out[8]: AB 0 1 2 1 3 4 2 4 5 3 6 7 4 8 9 

As noted, instead of an index chain, you should use DataFrame.loc to index rows as well as columns to update in a single call, avoiding this error. Example -

 pop.loc[(pop['year'] == 2014), 'Country Name'] = pop.loc[(pop['year'] == 2014), 'Country Name'].map(f) 

Or, if it seems too long to you, you can pre-create a mask (logical frame) and assign it to a variable, and use it in the above description. Example -

 mask = pop['year'] == 2014 pop.loc[mask,'Country Name'] = pop.loc[mask,'Country Name'].map(f) 

Demo -

 In [9]: df Out[9]: AB 0 1 2 1 3 4 2 4 5 3 6 7 4 8 9 In [10]: mapping = { 1:2 , 3:4} In [11]: f= lambda x: mapping.get(x, x) In [12]: df.loc[(df['B']==2),'A'] = df.loc[(df['B']==2),'A'].map(f) In [13]: df Out[13]: AB 0 2 2 1 3 4 2 4 5 3 6 7 4 8 9 

Demo with mask method -

 In [18]: df Out[18]: AB 0 1 2 1 3 4 2 4 5 3 6 7 4 8 9 In [19]: mask = df['B']==2 In [20]: df.loc[mask,'A'] = df.loc[mask,'A'].map(f) In [21]: df Out[21]: AB 0 2 2 1 3 4 2 4 5 3 6 7 4 8 9 
+10
source

I recommend that you reset the indexes in pop_recent = pop[pop['Year'] == 2014] .

If you want to apply some function to some dataframe column, try using the apply function of the DataFrame API function. Simple demo:

  mapping = { 'Korea, Rep.': 'South Korea', 'Taiwan, China': 'Taiwan' } df = pandas.DataFrame({'Country':['Korea, Rep.', 'Taiwan, China', 'Japan', 'USA'], 'date':[2014, 2014, 2015, 2014]}) df_recent = df[df['date'] == 2014].reset_index() df_recent['Country'] = df_recent['Country'].apply(lambda x: mapping.get(x, x)) 

Output:

 >>> df_recent index Country date 0 0 South Korea 2014 1 1 Taiwan 2014 2 3 USA 2014 
0
source

All Articles