Machine Science - Linear Regression Using Batch Gradient Descent

I am trying to implement batch gradient descent in a dataset with one function and several training examples ( m ).

When I try to use the normal equation, I get the correct answer, but not the one who has this code below that performs batch gradient descent in MATLAB.

  function [theta] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); delta=zeros(2,1); for iter =1:1:iterations for i=1:1:m delta(1,1)= delta(1,1)+( X(i,:)*theta - y(i,1)) ; delta(2,1)=delta(2,1)+ (( X(i,:)*theta - y(i,1))*X(i,2)) ; end theta= theta-( delta*(alpha/m) ); computeCost(X,y,theta) end end 

y is a vector with target values, X is a matrix with a first column filled with units and second columns of values ​​(variable).

I implemented this using vectorization, i.e.

 theta = theta - (alpha/m)*delta 

... where delta is a 2-element column vector initialized to zeros.

The cost function J(Theta) is 1/(2m)*(sum from i=1 to m [(h(theta)-y)^2]) .

+7
matlab machine-learning gradient linear-regression gradient-descent
source share
1 answer

The error is very simple. Your delta declaration must be inside the first for loop. Each time you accumulate weighted differences between the learning pattern and the output, you should start accumulating from the very beginning.

Without doing this, what you do accumulates errors from the previous iteration , which takes into account the error of the previous learned version of theta , which is incorrect. You should put this at the beginning of the first for loop.

Also, you seem to have an extraneous call to computeCost . I assume that this calculates the cost function at each iteration, taking into account the current parameters, and therefore I am going to create a new output array called cost , which shows this at each iteration. I will also call this function and assign it to the corresponding elements in this array:

 function [theta, costs] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); costs = zeros(m,1); %// New % delta=zeros(2,1); %// Remove for iter =1:1:iterations delta=zeros(2,1); %// Place here for i=1:1:m delta(1,1)= delta(1,1)+( X(i,:)*theta - y(i,1)) ; delta(2,1)=delta(2,1)+ (( X(i,:)*theta - y(i,1))*X(i,2)) ; end theta= theta-( delta*(alpha/m) ); costs(iter) = computeCost(X,y,theta); %// New end end 

A Note on Proper Vectoring

FWIW, I do not consider this implementation fully vectorized. You can exclude the second for loop using vectorized operations. Before we do this, let me highlight some theories so that we are on the same page. You use gradient descent here in terms of linear regression. We want to find the best theta parameters, which are our linear regression coefficients that seek to minimize this cost function:

enter image description here

m corresponds to the number of training patterns available, and x^{i} corresponds to training example i th . y^{i} corresponds to the basic truth value that we have associated with the teaching pattern i th . h is our hypothesis, and it is given as:

enter image description here

Note that in the context of linear regression in 2D, we have only two values ​​in theta that we want to calculate - the term intercept and slope.

We can minimize the cost function J to determine the best regression coefficients that can give us the best predictions that minimize the set of training errors. In particular, starting with some of the initial parameters of theta ... usually a vector of zeros, we iterate through iterations from 1 until we consider it necessary, and at each iteration we update our theta parameters as follows:

enter image description here

For each parameter that we want to update, you need to determine the gradient of the cost function with respect to each variable and evaluate what is in the current state of theta . If you work using Calculus, we get:

enter image description here

If you do not understand how this happens, then I refer you to this good publication in the section "Mathematical Statistics", which says this:

https://math.stackexchange.com/questions/70728/partial-derivative-in-gradient-descent-for-two-variables

Now ... how can we apply this to our current problem? In particular, you can calculate delta records by easily analyzing all samples together at a time. I mean, you can just do this:

 function [theta, costs] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); costs = zeros(m,1); for iter = 1 : iterations delta1 = theta(1) - (alpha/m)*(sum((theta(1)*X(:,1) + theta(2)*X(:,2) - y).*X(:,1))); delta2 = theta(2) - (alpha/m)*(sum((theta(1)*X(:,1) + theta(2)*X(:,2) - y).*X(:,2))); theta = [delta1; delta2]; costs(iter) = computeCost(X,y,theta); end end 

Operations on delta(1) and delta(2) can be fully vectorized in one expression for both. What are you doing theta^{T}*X^{i} for each pattern i from 1, 2, ..., m . You can conveniently place this in a single sum statement.

We can go even further and replace it with purely matrix operations. First, you can quickly calculate theta^{T}*X^{i} for each input sample x^{i} using matrix multiplication. Let's pretend that:

enter image description here

Here X is our data matrix, which consists of m rows corresponding to m training patterns and columns n corresponding to n . Similarly, theta is our scientific weight vector from gradient descent with n+1 functions, taking into account the term interception.

If we calculate X*theta , we get:

enter image description here

As you can see here, we calculated the hypothesis for each sample and placed them in a vector. Each element of this vector is a hypothesis for I th training sample. Now remember which gradient term of each parameter is in the gradient descent:

enter image description here

We want to implement this all at once for all parameters in your scientific vector, and therefore the inclusion of this in the vector gives us:

enter image description here

Finally:

enter image description here

Therefore, we know that y already a vector of length m , so we can very compactly compute the gradient descent at each iteration:

 theta = theta - (alpha/m)*X'*(X*theta - y); 

.... so your code is now simple:

 function [theta, costs] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); costs = zeros(m, 1); for iter = 1 : iterations theta = theta - (alpha/m)*X'*(X*theta - y); costs(iter) = computeCost(X,y,theta); end end 
+23
source share

All Articles