It's a little hard to explain, but I will try my best. Now I have two tables that I need to combine, but we really do not have a unique connection identifier. I have several columns to join this, this is the best I can do, and I just want to know when we do not have equal numbers on either side of the joins. Right now, if the correct table has 1 match with 2 records in the left table, then 1 corresponds to the joins of both records. This leaves me unaware that the right table has only 1 input versus 2 on the left.
I want to join the right table to the left (external), but I do not want to join the right table more than once per record. Therefore, if the right index of table 3 could be combined in index 1 and 2 on the left, I want it to be attached to index 1. Also, if index 3 and index 4 could be combined in indexes 1 and 2, I want so that index 1 corresponds to index 3 and index 2 corresponds to index 4. If there is only 1 match (index 1 → 3), but index 2 in the left table can be matched to index 3, I want index 2 to not connect.
Examples can best describe this:
a_df = pd.DataFrame.from_dict({1: {'match_id': 2, 'uniq_id': 1}, 2: {'match_id': 2, 'uniq_id': 2}}, orient='index') In [99]: a_df Out[99]: match_id uniq_id 1 2 1 2 2 2 In [100]: b_df = pd.DataFrame.from_dict({3: {'match_id': 2, 'uniq_id': 3}, 4: {'match_id': 2, 'uniq_id': 4}}, orient='index') In [101]: b_df Out[101]: match_id uniq_id 3 2 3 4 2 4
In this example, I want a_df to join b_df. I want b_df uniq_id 3 to match a_df uniq_id 1 and b_df 4 to a_df 2.
The result will look like this:
Out[106]: match_id_right match_id uniq_id uniq_id_right 1 2 2 1 3 2 2 2 2 4
Now suppose we want to join a_df to c_df:
In [104]: c_df = pd.DataFrame.from_dict({3: {'match_id': 2, 'uniq_id': 3}, 4: {'match_id': 3, 'uniq_id': 4}}, orient='index') In [105]: c_df Out[105]: match_id uniq_id 3 2 3 4 3 4
In this case, we have match_ids 2 on a_df and only 1 match_id out of 2 on c_df.
In this case, I just want uniq_id 1 to match uniq_id 3, leaving uniq_id 2 and uniq_id 4 unsurpassed
match_id_right match_id uniq_id uniq_id_right 1 2 2 1 3 2 NaN 2 2 NaN 4 3 NaN NaN 4