Upload image files to a catalog as a training dataset in Tensorflow

Question

Upload image files to a catalog as a training dataset in Tensorflow

I am new to tensor flow, and I start with the official MNIST example code to find out the logic of tensor flow. However, one thing that I didn’t feel very well is that the MNIST example provides the source dataset as some compressed files whose format is not clear to beginners. This case also applies to Cifar10, which provides a dataset as a binary file. I think that in the deep learning practical task, our data set can have many image files, such as *.jpg or *.png in the directory, as well as a text file that records the label of each file (for example, an ImageNet data set). Let me use MNIST as an example.

MNIST contains 50k training images of 28 x 28 size. Now suppose these images are in jpg format and are stored in the ./dataset/ directory. In ./dataset/ we have a text file label.txt that stores the label of each image:

 /path/to/dataset/ image00001.jpg image00002.jpg ... ... ... ... image50000.jpg label.txt

where label.txt as follows:

 #label.txt: image00001.jpg 1 image00002.jpg 0 image00003.jpg 4 image00004.jpg 9 ... ... ... ... image50000.jpg 3

Now I would like to use Tensorflow to train a single-layer model with these datasets. Can someone help make a simple code snippet for this?

+7

python tensorflow

C. Wang Oct 9 '16 at 19:15

source share

1 answer

Steven · Answer 1 · 2016-10-20T06:29:14+0000

There are basically two things you will need. The first is normal python code, for example:

 import numpy as np from scipy import misc # feel free to use another image loader def create_batches(batch_size): images = [] for img in list_of_images: images.append(misc.imread(img)) images = np.asarray(images) #do something similar for the labels while (True): for i in range(0,total,batch_size): yield(images[i:i+batch_size],labels[i:i+batch_size])

now part of the tensor flow appears

 imgs = tf.placeholder(tf.float32,shape=[None,height,width,colors]) lbls = tf.placeholder(tf.int32, shape=[None,label_dimension]) with tf.Session() as sess: #define rest of graph here # convolutions or linear layers and cost function etc. batch_generator = create_batches(batch_size) for i in range(number_of_epochs): images, labels = batch_generator.next() loss_value = sess.run([loss], feed_dict={imgs:images, lbls:labels})

Upload image files to a catalog as a training dataset in Tensorflow

More articles: