Derived explanations of the softmax function

Question

Derived explanations of the softmax function

I am trying to calculate the derivative of the activation function for softmax. I found this: https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function no one seems to give the correct conclusion on how we get the answers for i = j and i! = J. Can someone please explain this! I am confused with derivatives when summation occurs as in the denominator for the softmax activation function.

+7

neural-network calculus softmax derivative

Roshini Jun 13 '16 at 13:24

source share

2 answers

For what it's worth, here is my conclusion based on SirGuy's answer: (Feel free to indicate errors if you find them).

+3

Benjamin crowsier Oct 31 '17 at 14:48

source share

SirGuy · Accepted Answer · 2016-06-13T13:55:26+0000

The derivative of the sum is the sum of the derivatives, i.e.

d(f1 + f2 + f3 + f4)/dx = df1/dx + df2/dx + df3/dx + df4/dx

To get the derivatives of p_j with respect to o_i , we start with:

  d_i(p_j) = d_i(exp(o_j) / Sum_k(exp(o_k)))

I decided to use d_i for the derivative with respect to o_i , to make it easier to read. Using the product rule, we get:

  d_i(exp(o_j)) / Sum_k(exp(o_k)) + exp(o_j) * d_i(1/Sum_k(exp(o_k)))

Looking at the first term, the derivative will be 0 , if i != j , this can be represented using the function

Derived explanations of the softmax function

More articles: