Pandas Column Design with np.where ()

I do the assignment using Pandas and use np.where () to create a column in the Pandas DataFrame with three possible values:

fips_df['geog_type'] = np.where(fips_df.fips.str[-3:] != '000', 'county', np.where(fips_df.fips.str[:] == '00000', 'country', 'state')) 

The state of a DataFrame after adding a column is as follows:

 print fips_df[:5] fips geog_entity fips_prefix geog_type 0 00000 UNITED STATES 00 country 1 01000 ALABAMA 01 state 2 01001 Autauga County, AL 01 county 3 01003 Baldwin County, AL 01 county 4 01005 Barbour County, AL 01 county 

This column construction is verified by two statements. The first passes, and the second fails.

 ## check the numbers of geog_type assert set(fips_df['geog_type'].value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)]) assert set(fips_df.geog_type.value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)]) 

What is the difference between calling columns like fips_df.geog_type and fips_df ['geog_type'] that make my second statement fail?

+7
source share
2 answers

Just in case, you can create a new column with much less effort. For example:.

 In [1]: import pandas as pd In [2]: import numpy as np In [3]: df = pd.DataFrame(np.random.uniform(size=10)) In [4]: df Out[4]: 0 0 0.366489 1 0.697744 2 0.570066 3 0.756647 4 0.036149 5 0.817588 6 0.884244 7 0.741609 8 0.628303 9 0.642807 In [5]: categorize = lambda value: "ABC"[int(value > 0.3) + int(value > 0.6)] In [6]: df["new_col"] = df[0].apply(categorize) In [7]: df Out[7]: 0 new_col 0 0.366489 B 1 0.697744 C 2 0.570066 B 3 0.756647 C 4 0.036149 A 5 0.817588 C 6 0.884244 C 7 0.741609 C 8 0.628303 C 9 0.642807 C 
+3
source

It should be the same (and will be most of the time) ...

In one situation, this is not the case when you already have an attribute or method set with this value (in this case it will not be overridden and, therefore, the column will not be accessible with dotted notation):

 In [1]: df = pd.DataFrame([[1, 2] ,[3 ,4]]) In [2]: df.A = 7 In [3]: df.B = lambda: 42 In [4]: df.columns = list('AB') In [5]: df.A Out[5]: 7 In [6]: df.B() Out[6]: 42 In [7]: df['A'] Out[7]: 0 1 1 3 Name: A 

Interestingly, the dot notation for accessing columns is not mentioned in the select syntax .

+2
source

All Articles