So, I hope this is the real stupid thing I do, and there is a simple answer. I am trying to train a 2x3x1 neural network to solve the XOR problem. It didn’t work, so I decided to stop by to see what was happening. Finally, I decided to assign weight to myself. This was the weight vector that I came up with:
theta1 = [11 0 -5; 0 12 -7;18 17 -20]; theta2 = [14 13 -28 -6];
(In Matlab notation). I intentionally tried to make two weights the same (except for zeros)
And my code is very simple in matlab,
function layer2 = xornn(iters) if nargin < 1 iters = 50 end function s = sigmoid(X) s = 1.0 ./ (1.0 + exp(-X)); end T = [0 1 1 0]; X = [0 0 1 1; 0 1 0 1; 1 1 1 1]; theta1 = [11 0 -5; 0 12 -7;18 17 -20]; theta2 = [14 13 -28 -6]; for i = [1:iters] layer1 = [sigmoid(theta1 * X); 1 1 1 1]; layer2 = sigmoid(theta2 * layer1) delta2 = T - layer2; delta1 = layer1 .* (1-layer1) .* (theta2' * delta2); % remove the bias from delta 1. There no real point in a delta on the bias. delta1 = delta1(1:3,:); theta2d = delta2 * layer1'; theta1d = delta1 * X'; theta1 = theta1 - 0.1 * theta1d; theta2 = theta2 - 0.1 * theta2d; end end
I think so. I tested various parameters (theta) with the finite difference method to make sure they were right, and they seemed to be.
But, when I run it, in the end it all comes down to returning all zeros. If I do xornn (1) (for 1 iteration), I get
0.0027 0.9966 0.9904 0.0008
But if I do xornn (35)
0.0026 0.9949 0.9572 0.0007
(He started the descent in the wrong direction), and by the time I get to hornn (45), I get
0.0018 0.0975 0.0000 0.0003
If I ran it for 10,000 iterations, it just returns all 0.
What's happening? Should I add regularization? I would have thought that such a simple network would not be needed. But, regardless of why he is moving away from the obvious good decision I received from him?
Thanks!