Delete / skip records when loading data

I found some erroneous data in my training set (incorrectly marked examples), and while I corrected the source, I would like to continue experimenting with the same data set, so I need to skip these records.

I use TFRecordReader and load using parse_single_example and shuffle_batch. Can I provide a filter somewhere?

+4
source share
1 answer

Here is a brief description of how to do this in docs using tf.train.shuffle_batch()and enqueue_many=True. If you can determine if the example is incorrect using graph operations, you can filter the result like this (adapted from another SO answer) :

X, y = tf.parse_single_example(...)
is_correctly_labelled = correctly_labelled(X, y)
X = tf.expand_dims(X, 0)
y = tf.expand_dims(y, 0)
empty = tf.constant([], tf.int32)
X, y = tf.cond(is_correctly_labelled,
               lambda: [X, y],
               lambda: [tf.gather(X, empty), tf.gather(y, empty)])
Xs, ys = tf.train.shuffle_batch(
    [X, y], batch_size, capacity, min_after_dequeue,
    enqueue_many=True)

tf.gather - . numpy X[[], ...].

+4

All Articles