Suppose we have a list that adds an integer at each iteration, which is between 15, 32 (let it call the integer rand ). I want to develop an algorithm that assigns a reward of about 1 (1.25 to 0.75) to each rand . the rule for the appointment of remuneration is as follows.
first we calculate the average value of the list. Then, if rand greater than average, we expect that the reward will be less than 1, and if rand less than average, the reward will be greater than 1. The greater the distance between the average and rand , the greater the increase / decrease reward. eg:
rand = 15, avg = 23 then reward = 1.25
rand = 32, avg = 23 then reward = 0.75
rand = 23, avg = 23 then reward = 1 , etc.
I developed the code below for this algorithm:
import numpy as np rollouts = np.array([]) i = 0 def modify_reward(lst, rand): reward = 1 constant1 = 0.25 constant2 = 1 std = np.std(lst) global avg avg = np.mean(lst) sub = np.subtract(avg, rand) landa = sub / std if std != 0 else 0 coefficient = -1 + ( 2 / (1 + np.exp(-constant2 * landa))) md_reward = reward + (reward * constant1 * coefficient) return md_reward while i < 100: rand = np.random.randint(15, 33) rollouts = np.append(rollouts, rand) modified_reward = modify_reward(rollouts, rand) i += 1 print([i,rand, avg, modified_reward])
The algorithm works pretty well, but if you look at the examples below, you will notice a problem with the algorithm.
rand = 15, avg = 23.94 then reward = 1.17 # which has to be 1.25
rand = 32, avg = 23.94 then reward = 0.84 # which has to be 0.75
rand = 15, avg = 27.38 then reward = 1.15 # which has to be 1.25
rand = 32, avg = 27.38 then reward = 0.93 # which has to be 0.75
As you can see, the Algorithm does not take into account the distance between avg and borders (15, 32). The more avg moves toward a lower border or a higher border, the more modified_reward becomes unbalanced.
I need the modified_reward be evenly assigned, regardless of the fact that avg moves towards the upper bound or lower bound. Could someone suggest some modification of this algorithm that could consider the distance between avg and the borders of the list.