Understanding the softmax classifier

Question

Understanding the softmax classifier

I am trying to understand a simple implementation of the Softmax classifier from this link - CS231n - Convolutional neural networks for visual recognition . Here they implemented a simple softmax classifier. In the Softmax Classifier example, the link contains random 300 points in two-dimensional space and a label associated with them. The softmax classifier finds out which point belongs to the class.

Here is the complete softmax classifier code. Or you can see the link I provided.

# initialize parameters randomly
W = 0.01 * np.random.randn(D,K)
b = np.zeros((1,K))

# some hyperparameters
step_size = 1e-0
reg = 1e-3 # regularization strength

# gradient descent loop
num_examples = X.shape[0]
for i in xrange(200):

   # evaluate class scores, [N x K]
   scores = np.dot(X, W) + b 

   # compute the class probabilities
   exp_scores = np.exp(scores)
   probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]

   # compute the loss: average cross-entropy loss and regularization
   corect_logprobs = -np.log(probs[range(num_examples),y])
   data_loss = np.sum(corect_logprobs)/num_examples
   reg_loss = 0.5*reg*np.sum(W*W)
   loss = data_loss + reg_loss
   if i % 10 == 0:
   print "iteration %d: loss %f" % (i, loss)

   # compute the gradient on scores
   dscores = probs
   dscores[range(num_examples),y] -= 1
   dscores /= num_examples

   # backpropate the gradient to the parameters (W,b)
   dW = np.dot(X.T, dscores)
   db = np.sum(dscores, axis=0, keepdims=True)

   dW += reg*W # regularization gradient

   # perform a parameter update
   W += -step_size * dW
   b += -step_size * db

I cannot understand how they calculated the gradient here. I guess they calculated the gradient here -

   dW = np.dot(X.T, dscores)
   db = np.sum(dscores, axis=0, keepdims=True)
   dW += reg*W # regularization gradient

? , dW np.dot(X.T, dscores)? db np.sum(dscores, axis=0, keepdims=True)? , ? regularization gradient?

. , CS231n - Convolutional Neural Networks for Visual Recognition . , , . , stackoverflow. , , , .

+4

deep-learning machine-learning calculus softmax gradient-descent

S_kar 27 . '15 20:23

1

IVlad · Accepted Answer · 2015-08-28T08:40:04+0000

:

# compute the gradient on scores
dscores = probs
dscores[range(num_examples),y] -= 1
dscores /= num_examples

-, dscores , softmax. 1 , , .

1? , 1, . , , , : - 1, ( ), , . ( ), , .

w*x + b. w x, dW x /.

w*x + b b 1, dscores .

Understanding the softmax classifier

More articles: