NaN leads to a tensor flow neural network

I have this problem: after one iteration, almost all of my parameters (cost function, weight function, hypothesis function, etc.) output "NaN". My code is similar to the MNIST-Expert tensor flow tutorial ( https://www.tensorflow.org/versions/r0.9/tutorials/mnist/pros/index.html ). I searched for solutions already and still tried: to reduce the learning speed to zero and set it to zero using AdamOptimizer instead of gradient descent, using the sigmoid function for the hypothesis function at the last level and using only the numpy functions. I have some negative and zero values ​​in my input, so I cannot use logarithmic cross-entropy instead of the quadratic value function. The result is the same, but all input data consists of stresses and deformations of soils.

import tensorflow as tf import Datafiles3_pv_complete as soil import numpy as np m_training = int(18.0) m_cv = int(5.0) m_test = int(5.0) total_examples = 28 " range for running " range_training = xrange(0,m_training) range_cv = xrange(m_training,(m_training+m_cv)) range_test = xrange((m_training+m_cv),total_examples) """ Using interactive Sessions""" sess = tf.InteractiveSession() """ creating input and output vectors """ x = tf.placeholder(tf.float32, shape=[None, 11]) y_true = tf.placeholder(tf.float32, shape=[None, 3]) """ Standard Deviation Calculation""" stdev = np.divide(2.0,np.sqrt(np.prod(x.get_shape().as_list()[1:]))) """ Weights and Biases """ def weights(shape): initial = tf.truncated_normal(shape, stddev=stdev) return tf.Variable(initial) def bias(shape): initial = tf.truncated_normal(shape, stddev=1.0) return tf.Variable(initial) """ Creating weights and biases for all layers """ theta1 = weights([11,7]) bias1 = bias([1,7]) theta2 = weights([7,7]) bias2 = bias([1,7]) "Last layer" theta3 = weights([7,3]) bias3 = bias([1,3]) """ Hidden layer input (Sum of weights, activation functions and bias) z = theta^T * activation + bias """ def Z_Layer(activation,theta,bias): return tf.add(tf.matmul(activation,theta),bias) """ Creating the sigmoid function sigmoid = 1 / (1 + exp(-z)) """ def Sigmoid(z): return tf.div(tf.constant(1.0),tf.add(tf.constant(1.0), tf.exp(tf.neg(z)))) """ hypothesis functions - predicted output """ ' layer 1 - input layer ' hyp1 = x ' layer 2 ' z2 = Z_Layer(hyp1, theta1, bias1) hyp2 = Sigmoid(z2) ' layer 3 ' z3 = Z_Layer(hyp2, theta2, bias2) hyp3 = Sigmoid(z3) ' layer 4 - output layer ' zL = Z_Layer(hyp3, theta3, bias3) hypL = tf.add( tf.add(tf.pow(zL,3), tf.pow(zL,2) ), zL) """ Cost function """ cost_function = tf.mul( tf.div(0.5, m_training), tf.pow( tf.sub(hypL, y_true), 2)) #cross_entropy = -tf.reduce_sum(y_true*tf.log(hypL) + (1-y_true)*tf.log(1-hypL)) """ Gradient Descent """ train_step = tf.train.GradientDescentOptimizer(learning_rate=0.003).minimize(cost_function) """ Training and Evaluation """ correct_prediction = tf.equal(tf.arg_max(hypL, 1), tf.arg_max(y_true, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) sess.run(tf.initialize_all_variables()) keep_prob = tf.placeholder(tf.float32) """ Testing - Initialise lists """ hyp1_test = [] z2_test = [] hyp2_test = [] z3_test = [] hyp3_test = [] zL_test = [] hypL_test = [] cost_function_test =[] complete_error_test = [] theta1_test = [] theta2_test = [] theta3_test = [] bias1_test = [] bias2_test = [] bias3_test = [] """ ------------------------- """ complete_error_init = tf.abs(tf.reduce_mean(tf.sub(hypL,y_true),1)) training_error=[] for j in range_training: feedj = {x: soil.input_scale[j], y_true: soil.output_scale[j] , keep_prob: 1.0} """ ------------------------- """ 'Testing - adding to list' z2_init = z2.eval(feed_dict=feedj) z2_test.append(z2_init) hyp2_init = hyp2.eval(feed_dict=feedj) hyp2_test.append(hyp2_init) z3_init = z3.eval(feed_dict=feedj) z3_test.append(z3_init) hyp3_init = hyp3.eval(feed_dict=feedj) hyp3_test.append(hyp3_init) zL_init = zL.eval(feed_dict=feedj) zL_test.append(zL_init) hypL_init = hypL.eval(feed_dict=feedj) hypL_test.append(hypL_init) cost_function_init = cost_function.eval(feed_dict=feedj) cost_function_test.append(cost_function_init) complete_error = complete_error_init.eval(feed_dict=feedj) complete_error_test.append(complete_error) print 'number iterations: %g, error (S1, S2, S3): %g, %g, %g' % (j, complete_error[0], complete_error[1], complete_error[2]) theta1_init = theta1.eval() theta1_test.append(theta1_init) theta2_init = theta2.eval() theta2_test.append(theta2_init) theta3_init = theta3.eval() theta3_test.append(theta3_init) bias1_init = bias1.eval() bias1_test.append(bias1_init) bias2_init = bias2.eval() bias2_test.append(bias2_init) bias3_init = bias3.eval() bias3_test.append(bias3_init) """ ------------------------- """ train_accuracy = accuracy.eval(feed_dict=feedj) print("step %d, training accuracy %g" % (j, train_accuracy)) train_step.run(feed_dict=feedj) training_error.append(1 - train_accuracy) cv_error=[] for k in range_cv: feedk = {x: soil.input_scale[k], y_true: soil.output_scale[k] , keep_prob: 1.0} cv_accuracy = accuracy.eval(feed_dict=feedk) print("cross-validation accuracy %g" % cv_accuracy) cv_error.append(1-cv_accuracy) for l in range_test: print("test accuracy %g" % accuracy.eval(feed_dict={x: soil.input_matrixs[l], y_true: soil.output_matrixs[l], keep_prob: 1.0})) 

In recent weeks, I have been working on the Unit model for this problem, but the same result has occurred. I have no idea what to try next. Hope someone can help me.

Edit:

I checked some parameters again. The hypothesis function (hyp) and the activation function (z) for levels 3 and 4 (last level) have the same entries for each data point, i.e. the same value in each row for one column.

+5
source share
2 answers

Finally, more NaN values. The solution is to scale my input and output. The result (accuracy) is still not good, but at least I get some real values ​​for the parameters. I tried scaling functions earlier in other attempts (where there were probably some other errors), and suggested that this also does not help with my problem.

0
source

1e ^ -3 is still quite high for the classifier described. NaN actually means that weights tend to infinity, so I would suggest studying even lower learning speeds, at about 1e ^ -7. If it continues to diverge, multiply the learning speed by 0.1 and repeat until the weights are finite.

+1
source

All Articles