Derived explanations of the softmax function

I am trying to calculate the derivative of the activation function for softmax. I found this: https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function no one seems to give the correct conclusion on how we get the answers for i = j and i! = J. Can someone please explain this! I am confused with derivatives when summation occurs as in the denominator for the softmax activation function.

+7
neural-network calculus softmax derivative
source share
2 answers

The derivative of the sum is the sum of the derivatives, i.e.

d(f1 + f2 + f3 + f4)/dx = df1/dx + df2/dx + df3/dx + df4/dx 

To get the derivatives of p_j with respect to o_i , we start with:

  d_i(p_j) = d_i(exp(o_j) / Sum_k(exp(o_k))) 

I decided to use d_i for the derivative with respect to o_i , to make it easier to read. Using the product rule, we get:

  d_i(exp(o_j)) / Sum_k(exp(o_k)) + exp(o_j) * d_i(1/Sum_k(exp(o_k))) 

Looking at the first term, the derivative will be 0 , if i != j , this can be represented using the function

+13
source share

For what it's worth, here is my conclusion based on SirGuy's answer: (Feel free to indicate errors if you find them).

enter image description here

+3
source share

All Articles