For the library of neural networks, I implemented some activation functions and loss functions and their derivatives. They can be combined arbitrarily, and the derivative on the output layers simply becomes the product of the derivative of losses and the derivative of activation.
However, I was not able to implement the derivative of the Softmax activation function regardless of any loss function. Due to normalization, that is, the denominator in the equation, a change in one input activation changes all output activations, and not just one.
Here is my Softmax implementation where the derivative does not perform a gradient check of about 1%. How can I implement the Softmax derivative so that it can be combined with any loss function?
import numpy as np class Softmax: def compute(self, incoming): exps = np.exp(incoming) return exps / exps.sum() def delta(self, incoming, outgoing): exps = np.exp(incoming) others = exps.sum() - exps return 1 / (2 + exps / others + others / exps) activation = Softmax() cost = SquaredError() outgoing = activation.compute(incoming) delta_output_layer = activation.delta(incoming) * cost.delta(outgoing)
regression neural-network softmax backpropagation derivative
danijar
source share