I have three suggestions, depending on how much thinking and work you want to do.
Firstly, during gradient descent / ascent, each time you move along the gradient time to a certain coefficient, which you call the "learning speed coefficient". If, as you described, this step causes x to become negative, there are two natural interpretations: either the gradient was too large or the coefficient of learning speed was too large. Since you cannot control the gradient, take the second interpretation. Make sure that your move causes x to become negative, and if so, halve the learning ratio and try again.
Secondly, to clarify Aniko's answer, let x be your parameter and f (x) your function. Then we define a new function g (x) = f (e ^ x) and note that although the region f is (0, infinity), the region g is (-infection, infinity). Therefore, g cannot suffer from the problems that f suffers. Use gradient descent to find the x_0 value that maximizes g. Then e ^ (x_0), positive, maximizes f. To apply gradient descent along g, you need its derivative, which is f '(e ^ x) * e ^ x, according to the chain rule.
Thirdly, it looks like you are trying to maximize only one function, and not write a general maximization procedure. You can consider the possibility of gradient descent descent and an optimization method to the features of your particular function. We would need to learn a lot more about the expected behavior of f to help you with this.
Josephine
source share