Neural networks: sigmoid activation function for continuous output variable

Okay, so I'm in the middle of a course on Andrew Ng machines in the course and would like to adapt a neural network that was completed as part of assignment 4.

In particular, the neural network, which I correctly completed as part of the task, was as follows:

  • Sigmoid activation function: g(z) = 1/(1+e^(-z))
  • 10 units of output, each of which can take 0 or 1
  • 1 hidden layer
  • Back propagation method used to minimize the cost function.
  • Cost function:

-1 / m sum ^ m_ {i = 1} sum ^ K_ {k = 1} (y_k _ {(i)}) log ((h_theta (x ^ {(i)} _ k) + (1-y_k ^ {( i)}) log (1-h_theta (x ^ {(i)} _ k) + lambda / (2 * m) (sum_ {l = 1} ^ {L-1} sum_ {i = 1} ^ {s_l} sum_ {j = 1} ^ {s_ {l = 1}} (Theta_ {ji} ^ {(l)}) ^ {2}

where L=number of layers , s_l = number of units in layer l , m = number of training examples , K = number of output units

Now I want to configure the exercise so that there is one continuous output block that takes any value between [0,1], and I'm trying to figure out what needs to be changed, so far I have

  • Replaced the data with my own, i.e. so that the output is a continuous variable from 0 to 1
  • Updated links to the number of output units
  • Updated cost function in backpropagation algorithm: J = 1 / (2m) * sum ^ m_ {i = 1} (g (a_3) -y) ^ 2 + lambda / (2 * m) (sum_ {l = 1} ^ {L-1} sum_ {i = 1} ^ {s_l} sum_ {j = 1} ^ {s_ {l = 1}} (Theta_ {ji} ^ {(l)}) ^ {2} where a_3 is the value of the output block determined by direct distribution.

I'm sure something else should change, since the gradient test method shows a gradient determined by back propagation, and that the numerical approximation no longer matches. I have not changed the sigmoid gradient; it remains at f(z)*(1-f(z)) , where f(z) is the sigmoid function 1/(1+e^(-z))) , and I did not update the numerical approximation of the derivative formula; just (J(theta+e) - J(theta-e))/(2e) .

Can someone tell me what other steps will be required?

Matlab encoded as follows:

 % FORWARD PROPAGATION % input layer a1 = [ones(m,1),X]; % hidden layer z2 = a1*Theta1'; a2 = sigmoid(z2); a2 = [ones(m,1),a2]; % output layer z3 = a2*Theta2'; a3 = sigmoid(z3); % BACKWARD PROPAGATION delta3 = a3 - y; delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2); Theta1_grad = (delta2'*a1)/m; Theta2_grad = (delta3'*a2)/m; % COST FUNCTION J = 1/(2 * m) * sum( (a3-y).^2 ); % Implement regularization with the cost function and gradients. Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m; Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m; J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2))); 

From then on, I realized that this question is similar to this question asked by https://stackoverflow.com/a/29569/... however, in this case I want the continuous variable to be between 0 and 1 and therefore use a sigmoid function.

+5
source share
2 answers

First, your cost function should be:

 J = 1/m * sum( (a3-y).^2 ); 

I think your Theta2_grad = (delta3'*a2)/m; must correspond to the numerical approximation after a change by delta3 = 1/2 * (a3 - y); )

See slide for details.

EDIT: In case there is a slight discrepancy between our codes, I applied my code below for your reference. The code has already been mapped to the numerical approximation function checkNNGradients(lambda); , The relative difference is less than 1e-4 (but does not satisfy the requirement 1e-11 Dr. Andrew Ng)

 function [J grad] = nnCostFunctionRegression(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda) Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1)); Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1)); m = size(X, 1); J = 0; Theta1_grad = zeros(size(Theta1)); Theta2_grad = zeros(size(Theta2)); X = [ones(m, 1) X]; z1 = sigmoid(X * Theta1'); zs = z1; z1 = [ones(m, 1) z1]; z2 = z1 * Theta2'; ht = sigmoid(z2); y_recode = zeros(length(y),num_labels); for i=1:length(y) y_recode(i,y(i))=1; end y = y_recode; regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2))); J=1/(m)*sum(sum((ht - y).^2))+regularization; delta_3 = 1/2*(ht - y); delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1'); delta_cap2 = delta_3' * z1; delta_cap1 = delta_2' * X; Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1)); Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2)); Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1))); Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1))); grad = [Theta1_grad(:) ; Theta2_grad(:)]; end 
+1
source

If you want a continuous output, do not try to use sigmoid activation when calculating the target value.

 a1 = [ones(m, 1) X]; a2 = sigmoid(X * Theta1'); a2 = [ones(m, 1) z1]; a3 = z1 * Theta2'; ht = a3; 

Normalize input before using it in nnCostFunction. Everything else remains the same.

0
source

All Articles