TensorFlow - why does this sofmax regression not recognize anything?

I'm going to do big things with TensorFlow, but I'm trying to start small.

I have small gray squares (with a little noise), and I want to classify them according to their color (for example, 3 categories: black, gray, white). I wrote a small Python class to generate squares and 1-hot vectors, and modified their base MNIST example to feed them.

But he does not know anything - for example. for 3 categories, he always guesses ≈33% correctly.

import tensorflow as tf import generate_data.generate_greyscale data_generator = generate_data.generate_greyscale.GenerateGreyScale(28, 28, 3, 0.05) ds = data_generator.generate_data(10000) ds_validation = data_generator.generate_data(500) xs = ds[0] ys = ds[1] num_categories = data_generator.num_categories x = tf.placeholder("float", [None, 28*28]) W = tf.Variable(tf.zeros([28*28, num_categories])) b = tf.Variable(tf.zeros([num_categories])) y = tf.nn.softmax(tf.matmul(x,W) + b) y_ = tf.placeholder("float", [None,num_categories]) cross_entropy = -tf.reduce_sum(y_*tf.log(y)) train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) # let batch_size = 100 --> therefore there are 100 batches of training data xs = xs.reshape(100, 100, 28*28) # reshape into 100 minibatches of size 100 ys = ys.reshape((100, 100, num_categories)) # reshape into 100 minibatches of size 100 for i in range(100): batch_xs = xs[i] batch_ys = ys[i] sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) xs_validation = ds_validation[0] ys_validation = ds_validation[1] print sess.run(accuracy, feed_dict={x: xs_validation, y_: ys_validation}) 

My data generator looks like this:

 import numpy as np import random class GenerateGreyScale(): def __init__(self, num_rows, num_cols, num_categories, noise): self.num_rows = num_rows self.num_cols = num_cols self.num_categories = num_categories # set a level of noisiness for the data self.noise = noise def generate_label(self): lab = np.zeros(self.num_categories) lab[random.randint(0, self.num_categories-1)] = 1 return lab def generate_datum(self, lab): i = np.where(lab==1)[0][0] frac = float(1)/(self.num_categories-1) * i arr = np.random.uniform(max(0, frac-self.noise), min(1, frac+self.noise), self.num_rows*self.num_cols) return arr def generate_data(self, num): data_arr = np.zeros((num, self.num_rows*self.num_cols)) label_arr = np.zeros((num, self.num_categories)) for i in range(0, num): label = self.generate_label() datum = self.generate_datum(label) data_arr[i] = datum label_arr[i] = label #data_arr = data_arr.astype(np.float32) #label_arr = label_arr.astype(np.float32) return data_arr, label_arr 
+6
source share
4 answers

While the dga and syncd answers were helpful, I tried using non-zero weight initialization and larger datasets, but to no avail. Ultimately worked using a different optimization algorithm.

I replaced:

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

with

train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)

I also included training for the cycle in another for the cycle, to train for several eras, which led to convergence as follows:

  ===# EPOCH 0 #=== Error: 0.370000004768 ===# EPOCH 1 #=== Error: 0.333999991417 ===# EPOCH 2 #=== Error: 0.282000005245 ===# EPOCH 3 #=== Error: 0.222000002861 ===# EPOCH 4 #=== Error: 0.152000010014 ===# EPOCH 5 #=== Error: 0.111999988556 ===# EPOCH 6 #=== Error: 0.0680000185966 ===# EPOCH 7 #=== Error: 0.0239999890327 ===# EPOCH 8 #=== Error: 0.00999999046326 ===# EPOCH 9 #=== Error: 0.00400000810623 

EDIT - WHY IT WORKS: I suppose the problem was that I didn’t manually select a good training schedule, and Adam was able to create the best option automatically.

+2
source

First, try initializing your W matrix with random values, not zeros - you are not giving the optimizer anything to work with when the output is all zeros for all inputs.

Instead:

 W = tf.Variable(tf.zeros([28*28, num_categories])) 

Try:

 W = tf.Variable(tf.truncated_normal([28*28, num_categories], stddev=0.1)) 
+3
source

You release that your gradients increase / decrease without limits, resulting in the loss function becoming nan.

Take a look at this question: Why does the TensorFlow example not work when batch size increases?

Also, make sure you run the model for enough steps. You only run it once through the data set of your train (100 times * 100 examples), and this is not enough for it to converge. Increase it to about a minimum of 2000 (runs 20 times through your data set).

Edit (can't comment, so I'll add my thoughts here): The connection point I linked is that you can use GradientDescentOptimizer if you do a learning speed of about 0.001. What a problem, your learning rate was too high for the loss function you used.

Alternatively, use another loss function that does not increase / decrease gradients. Use tf.reduce_mean instead of tf.reduce_sum in the definition of crossEntropy .

+2
source

Found this question when I had a similar problem. I fixed mine by scaling functions.

A small background: I followed the tenorflow tutorial, however I wanted to use the data from Kaggle ( see the data here ) to do it but at the beginning I had the same problem: the model simply does not study ... After rounds of shooting, I realized that the data Kaggle were on a completely different scale. Therefore, I scaled the data so that it had the same scale (0.1) as the MNIST dataset for the tensor flow.

Just thought that I would add my two cents here ... in case some beginners who try to follow the tutorial settings get stuck, as I did =)

0
source

All Articles