I am trying to create a stochastic gradient descent, but I do not know if it is 100% consistent.
- The cost generated by my scholastic gradient descent algorithm is sometimes very far from the cost created using the FMINUC or Batch gradient descent method.
- while the cost of lowering the gradient of the batch converges, when I set the learning speed alpha to 0.2, I have to set the learning speed alpha to 0.0001 for my stochastic implementation so that it does not diverge. This is normal?
Here are some results that I got with a training kit of 10,000 items and num_iter = 100 or 500
FMINUC : Iteration
Perform gradient descent for logistic regression
J_history = zeros(num_iters, 1); for iter = 1:num_iters [J, gradJ] = lrCostFunction(theta, X, y, lambda); theta = theta - alpha * gradJ; J_history(iter) = J; fprintf('Iteration #%d - Cost = %d... \r\n',iter, J_history(iter)); end
Implementation of stochastic gradient descent for logistic regression
% number of training examples m = length(y); % STEP1 : we shuffle the data data = [y, X]; data = data(randperm(size(data,1)),:); y = data(:,1); X = data(:,2:end); for iter = 1:num_iters for i = 1:m x = X(i,:); % Select one example [J, gradJ] = lrCostFunction(theta, x, y(i,:), lambda); theta = theta - alpha * gradJ; end J_history(iter) = J; fprintf('Iteration #%d - Cost = %d... \r\n',iter, J); end
For reference, here is the logistic regression cost function used in my example
function [J, grad] = lrCostFunction(theta, X, y, lambda) m = length(y); % number of training examples % We calculate J hypothesis = sigmoid(X*theta); costFun = (-y.*log(hypothesis) - (1-y).*log(1-hypothesis)); J = (1/m) * sum(costFun) + (lambda/(2*m))*sum(theta(2:length(theta)).^2); % We calculate grad using the partial derivatives beta = (hypothesis-y); grad = (1/m)*(X'*beta); temp = theta; temp(1) = 0; % because we don't add anything for j = 0 grad = grad + (lambda/m)*temp; grad = grad(:); end
matlab machine-learning logistic-regression gradient-descent
alexandrekow
source share