Assign the same random value to AB, BA pairs in python Dataframe

I have a Dataframe for example

Sou Des 1 3 1 4 2 3 2 4 3 1 3 2 4 1 4 2 

I need to assign a random value for each pair between 0 and 1, but I need to assign the same random value to both similar pairs, such as “1-3”, “3-1” and other pairs. I expect a data frame for example

  Sou Des Val 1 3 0.1 1 4 0.6 2 3 0.9 2 4 0.5 3 1 0.1 3 2 0.9 4 1 0.6 4 2 0.5 

How to assign the same random value to the same pairs as "AB" and "BA" in python pandas.

+7
python pandas dataframe
source share
4 answers

First create a DF helper sorted using axis=1 :

 In [304]: x = pd.DataFrame(np.sort(df, axis=1), df.index, df.columns) In [305]: x Out[305]: Sou Des 0 1 3 1 1 4 2 2 3 3 2 4 4 1 3 5 2 3 6 1 4 7 2 4 

now we can group by its columns:

 In [306]: df['Val'] = (x.assign(c=1) .groupby(x.columns.tolist()) .transform(lambda x: np.random.rand(1))) In [307]: df Out[307]: Sou Des Val 0 1 3 0.989035 1 1 4 0.918397 2 2 3 0.463653 3 2 4 0.313669 4 3 1 0.989035 5 3 2 0.463653 6 4 1 0.918397 7 4 2 0.313669 
+6
source share

This is a new way.

 s=pd.crosstab(df.Sou,df.Des) b = np.random.random_integers(-2000,2000,size=(len(s),len(s))) sy = (b + bT)/2 s.mul(sy).replace(0,np.nan).stack().reset_index() Out[292]: Sou Des 0 0 1 3 -60.0 1 1 4 -867.0 2 2 3 269.0 3 2 4 1152.0 4 3 1 -60.0 5 3 2 269.0 6 4 1 -867.0 7 4 2 1152.0 
+2
source share

The trick here is to work out a little work with the file frame. You can break this down into three steps:

  • compile a list of all tuples (a,b)
  • assign a random value to each pair so that (a,b) and (b,a) have the same value
  • fill in a new column

Assuming your framework is called df , we can make a list of all pairs arranged so that a <= b . I think this will be easier than trying to track both (a,b) and (b,a) .

 pairs = set([(a,b) if a <= b else (b,a) for a, b in df.itertuples(index=False,name=None)) 

It is simple enough to assign a random number to each of these pairs and store it in a dictionary, so I will leave it to you. Name it pair_dict .

Now we just need to find the values. Ultimately we want to write

 df['Val'] = df.apply(<some function>, axis=1) 

where our function looks at the corresponding value in pair_dict .

Instead of trying to squeeze it into a lambda (although we could), write it separately.

 def func(row): if row['Sou'] <= row['Des']: key = (row['Sou'], row['Des']) else: key = (row['Des'], row['Sou']) return pair_dict[key] 
0
source share

if you have a "random" value coming from the hash () method, which you can achieve with the frozenset () function

 df = pd.DataFrame([[1,1,2,2,3,3,4,4],[3,4,3,4,1,2,1,2]]).T df.columns = ['Sou','Des'] df['Val']= df.apply(lambda x: hash(frozenset([x["Sou"],x["Des"]])),axis=1) print df 

which gives:

  Sou Des Val 0 1 3 1580307032 1 1 4 -1736016661 2 2 3 741508915 3 2 4 -1930135584 4 3 1 1580307032 5 3 2 741508915 6 4 1 -1736016661 7 4 2 -1930135584 

reference: Why does Python not contain hashing?

0
source share

All Articles