Update pandas DataFrame by key

I have a database of historical stock trading. The frame has columns such as ['ticker', 'date', 'cusip', 'profit', 'security_type']. Initially:

trades['cusip'] = np.nan trades['security_type'] = np.nan 

I have historical configuration files that I can load into frames with columns such as ['ticker', 'cusip', 'date', 'name', 'security_type', 'primary_exchange'].

I would like to UPDATE the frame frame using cusip and security_type from config, but only where the ticker and date match.

I thought I could do something like:

 pd.merge(trades, config, on=['ticker', 'date'], how='left') 

But this does not update the columns, it just adds configuration columns to transactions.

The following works, but I think there should be a better way. If not, I will probably do it outside of pandas.

 for date in trades['date'].unique(): config = get_config_file_as_df(date) ## config['date'] == date for ticker in trades['ticker'][trades['date'] == date]: trades['cusip'][ (trades['ticker'] == ticker) & (trades['date'] == date) ] \ = config['cusip'][config['ticker'] == ticker].values[0] trades['security_type'][ (trades['ticker'] == ticker) & (trades['date'] == date) ] \ = config['security_type'][config['ticker'] == ticker].values[0] 
+6
source share
1 answer

Suppose you have this setting:

 import pandas as pd import numpy as np import datetime as DT nan = np.nan trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'], 'date' : pd.date_range('1/1/2000', periods = 4), 'cusip' : [nan, nan, 100, nan] }) trades = trades.set_index(['ticker', 'date']) print(trades) # cusip # ticker date # IBM 2000-01-01 NaN # MSFT 2000-01-02 NaN # GOOG 2000-01-03 100 # <-- We do not want to overwrite this # AAPL 2000-01-04 NaN config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'], 'date' : pd.date_range('1/1/2000', periods = 4), 'cusip' : [1,2,3,nan]}) config = config.set_index(['ticker', 'date']) # Let permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows. new_index = sorted(config.index) config = config.reindex(new_index) print(config) # cusip # ticker date # AAPL 2000-01-04 NaN # GOOG 2000-01-03 3 # IBM 2000-01-01 1 # MSFT 2000-01-02 2 

Then you can update the NaN values ​​in trades with the values ​​from config using the DataFrame.update method. Note that DataFrame.update matches rows based on indexes (which is why set_index is called above).

 trades.update(config, join = 'left', overwrite = False) print(trades) # cusip # ticker date # IBM 2000-01-01 1 # MSFT 2000-01-02 2 # GOOG 2000-01-03 100 # If overwrite = True, then 100 is overwritten by 3. # AAPL 2000-01-04 NaN 
+13
source

All Articles