Working with data and series comparison and creating new data frames on the fly in python pandas

Question

Working with data and series comparison and creating new data frames on the fly in python pandas

I am creating a function that compares a data frame (DF) with a series (S) and ultimately returns a new data frame. The common column is "name". I want the function to return a dataframe with the same number of rows as row (S) and the same number of columns as df. The function will search the name columns in df and find all matching names in the series (S). If a match is found, I want to create a new line in the new data framework that matches the df line for this particular name. If no match is found, I want to create a new row for the resulting framework independently, but include all the 0.0 values for the cells for this particular row. I have been trying to figure this out for the last 6 hours. I guess I'm having broadcast issues. Here is what I have tried.

Here are some sample data.

Series:

  S[500:505]
  500                 Nanotechnology
  501                          Music
  502       Logistics & Supply Chain
  503    Computer & Network Security
  504              Computer Software
  Name: name, dtype: object

DataFrame: : name, . , = 0 .

          Defense & Space  Computer Software  Internet  Semiconductors  \
  0              1.0                0.0       0.0             0.0   
  1              0.0                1.0       0.5             0.5   
  2              0.0                0.5       1.0             0.5   
  3              0.0                0.5       0.5             1.0   
  4              0.5                0.0       0.0             0.0   


S.shape = (31454,)
df.shape = (100,101)

all_zeros = np.zeros((len(S),len(df.columns)))

numpy dataframe

result = pd.DataFrame(data = all_zeros, columns=df.columns, index = range(len(s)))

,

result = result.drop('name', axis=1)

, ,

def set_cell_values(row):
    return df.iloc[1,:]

,

for index in range(len(df)):
    names_are_equal = df['name'][index] == result['name']
    map(lambda x: set_cell_values(row), result[names_are_equal]))

, , . , ? , df ( ).

0

python numpy pandas

Donald Vetal 09 . '14 18:35

1

Adriano Almeida · Answer 1 · 2014-10-09T19:16:01+0000

,
, :

# with this tables 
In [66]: S
Out[66]:
0    aaa
1    bbb
2    ccc
3    ddd
4    eee
Name: name, dtype: object

In [84]: df
Out[84]:
    a   b   c name
0  39  71  55  aaa
1   9  57   6  bbb
2  72  22  52  iii
3  68  97  81  jjj
4  30  64  78  kkk

# transform the series to a dataframe
Sd = pd.DataFrame(S)
# merge them with outer join (will keep both tables columns and values).
# fill the NAs with 0
In [86]: pd.merge(Sd,df, how='outer').fillna(0)
Out[86]:
  name   a   b   c
0  aaa  39  71  55
1  bbb   9  57   6
2  ccc   0   0   0
3  ddd   0   0   0
4  eee   0   0   0
5  iii  72  22  52
6  jjj  68  97  81
7  kkk  30  64  78

, ?

Working with data and series comparison and creating new data frames on the fly in python pandas

More articles: