XOR with neural networks (Matlab)

So, I hope this is the real stupid thing I do, and there is a simple answer. I am trying to train a 2x3x1 neural network to solve the XOR problem. It didn’t work, so I decided to stop by to see what was happening. Finally, I decided to assign weight to myself. This was the weight vector that I came up with:

theta1 = [11 0 -5; 0 12 -7;18 17 -20]; theta2 = [14 13 -28 -6]; 

(In Matlab notation). I intentionally tried to make two weights the same (except for zeros)

And my code is very simple in matlab,

 function layer2 = xornn(iters) if nargin < 1 iters = 50 end function s = sigmoid(X) s = 1.0 ./ (1.0 + exp(-X)); end T = [0 1 1 0]; X = [0 0 1 1; 0 1 0 1; 1 1 1 1]; theta1 = [11 0 -5; 0 12 -7;18 17 -20]; theta2 = [14 13 -28 -6]; for i = [1:iters] layer1 = [sigmoid(theta1 * X); 1 1 1 1]; layer2 = sigmoid(theta2 * layer1) delta2 = T - layer2; delta1 = layer1 .* (1-layer1) .* (theta2' * delta2); % remove the bias from delta 1. There no real point in a delta on the bias. delta1 = delta1(1:3,:); theta2d = delta2 * layer1'; theta1d = delta1 * X'; theta1 = theta1 - 0.1 * theta1d; theta2 = theta2 - 0.1 * theta2d; end end 

I think so. I tested various parameters (theta) with the finite difference method to make sure they were right, and they seemed to be.

But, when I run it, in the end it all comes down to returning all zeros. If I do xornn (1) (for 1 iteration), I get

 0.0027 0.9966 0.9904 0.0008 

But if I do xornn (35)

 0.0026 0.9949 0.9572 0.0007 

(He started the descent in the wrong direction), and by the time I get to hornn (45), I get

 0.0018 0.0975 0.0000 0.0003 

If I ran it for 10,000 iterations, it just returns all 0.

What's happening? Should I add regularization? I would have thought that such a simple network would not be needed. But, regardless of why he is moving away from the obvious good decision I received from him?

Thanks!

+2
source share
1 answer

AAARRGGHHH! The decision was just a matter of change.

 theta1 = theta1 - 0.1 * theta1d; theta2 = theta2 - 0.1 * theta2d; 

to

 theta1 = theta1 + 0.1 * theta1d; theta2 = theta2 + 0.1 * theta2d; 

sigh

Now, I need to find out how I calculate the negative derivative in some way, when what I thought I was calculating was ... Nothing. I’ll post it here anyway, just in case it helps someone else.

So, z = the sum of the inputs of the sigmoid, and y is the output of the sigmoid.

 C = -(T * Log[y] + (1-T) * Log[(1-y)) dC/dy = -((T/y) - (1-T)/(1-y)) = -((T(1-y)-y(1-T))/(y(1-y))) = -((T-Ty-y+Ty)/(y(1-y))) = -((Ty)/(y(1-y))) = ((yT)/(y(1-y))) # This is the source of all my woes. dy/dz = y(1-y) dC/dz = ((yT)/(y(1-y))) * y(1-y) = (yT) 

So the problem is that I accidentally calculated Ty, because I forgot about the negative sign in front of the cost function. Then I subtracted what I thought was a gradient, but was actually a negative gradient. And there. This is problem.

As soon as I did this:

 function layer2 = xornn(iters) if nargin < 1 iters = 50 end function s = sigmoid(X) s = 1.0 ./ (1.0 + exp(-X)); end T = [0 1 1 0]; X = [0 0 1 1; 0 1 0 1; 1 1 1 1]; theta1 = [11 0 -5; 0 12 -7;18 17 -20]; theta2 = [14 13 -28 -6]; for i = [1:iters] layer1 = [sigmoid(theta1 * X); 1 1 1 1]; layer2 = sigmoid(theta2 * layer1) delta2 = T - layer2; delta1 = layer1 .* (1-layer1) .* (theta2' * delta2); % remove the bias from delta 1. There no real point in a delta on the bias. delta1 = delta1(1:3,:); theta2d = delta2 * layer1'; theta1d = delta1 * X'; theta1 = theta1 + 0.1 * theta1d; theta2 = theta2 + 0.1 * theta2d; end end 

xornn (50) returns 0.0028 0.9972 0.9948 0.0009 and xornn (10000) returns 0.0016 0.9989 0.9993 0.0005

Phew! Perhaps this will help someone else in debugging their version.

0
source

All Articles