Loss function gradient for SVM

I am working on this class in convolutional neural networks. I am trying to implement a loss function gradient for svm and (I have a copy of the solution). I am having trouble understanding why the solution is correct.

On the this page, she defines the gradient of the loss function as follows: Cs231n course notes In my code, I have my analytic gradient the same as a numerical gradient when it is implemented in the code as follows:

dW = np.zeros(W.shape) # initialize the gradient as zero # compute the loss and the gradient num_classes = W.shape[1] num_train = X.shape[0] loss = 0.0 for i in xrange(num_train): scores = X[i].dot(W) correct_class_score = scores[y[i]] for j in xrange(num_classes): if j == y[i]: if margin > 0: continue margin = scores[j] - correct_class_score + 1 # note delta = 1 if margin > 0: dW[:, y[i]] += -X[i] dW[:, j] += X[i] # gradient update for incorrect rows loss += margin 

However, from the notes it seems that dW[:, y[i]] should be changed every time j == y[i] , since we subtract losses whenever j == y[i] . I am very confused why there is no code:

  dW = np.zeros(W.shape) # initialize the gradient as zero # compute the loss and the gradient num_classes = W.shape[1] num_train = X.shape[0] loss = 0.0 for i in xrange(num_train): scores = X[i].dot(W) correct_class_score = scores[y[i]] for j in xrange(num_classes): if j == y[i]: if margin > 0: dW[:, y[i]] += -X[i] continue margin = scores[j] - correct_class_score + 1 # note delta = 1 if margin > 0: dW[:, j] += X[i] # gradient update for incorrect rows loss += margin 

and the loss will change with j == y[i] . Why are they both calculated when J != y[i] ?

+8
python computer-vision svm linear-regression gradient-descent
source share
1 answer

I don't have enough reputation for comment, so I answer here. Whenever you calculate the loss vector for x[i] , the i training example, and get some non-zero loss, that means you have to move the weight vector for the wrong class (j != y[i]) to x[i] and at the same time move the weights or hyperplanes for the correct class ( j==y[i] ) near x[i] . By law, the parallelogram w + x is between w and x . Thus, w[y[i]] tries to approach x[i] every time it finds loss>0 .

Thus, dW[:,y[i]] += -X[i] and dW[:,j] += X[i] are executed in a loop, but when updating we will do in the direction of decreasing the gradient, therefore we essentially add x[i] to correct the weights of classes and departing by x[i] from weights that skip the classification.

+4
source share

All Articles