Negative tensor sampling

I am trying to follow the udacity tutorial on tensor flow, where I came across the following two lines to embed words:

# Look up embeddings for inputs. embed = tf.nn.embedding_lookup(embeddings, train_dataset) # Compute the softmax loss, using a sample of the negative labels each time. loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed, train_labels, num_sampled, vocabulary_size)) 

Now I understand that the second operator refers to the selection of negative labels. But the question is, how does he know what negative tags are? All I provide for the second function is the current input and the corresponding labels, as well as the number of labels that I want to (negatively) select. Is there a risk of fetching from the input data set by itself?

This is a complete example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb

+7
python tensorflow
source share
2 answers

You can find the documentation for tf.nn.sampled_softmax_loss() here . There is even a good explanation of the TensorFlow Candidate Selection here (pdf) .


How does he know what negative tags are?

TensorFlow will arbitrarily select negative classes among all possible classes (for you, all possible words).

Is there a risk of fetching from the input data set by itself?

If you want to calculate the softmax probability for your true label, you calculate: logits[true_label] / sum(logits[negative_sampled_labels] . Since the number of classes is huge (dictionary size), there is very little chance of trying true_label as a negative label.
In any case, I think TensorFlow completely removes this feature with random sampling. (EDIT: @Alex confirms that TensorFlow does this by default)

+8
source share

The sample of the candidate explains how the loss discretization function is calculated:

  • Calculate the loss function in the subset C of all training samples L, where C = T ⋃ S, T are samples in the target classes, and S are random samples in all classes.

The code you use uses tf.nn.embedding_lookup to get the inputs [batch_size, dim] embed .

Then it uses tf.nn.sampled_softmax_loss to get the loss discretization function:

  • softmax_weights: form tensor [num_classes, dim].
  • softmax_biases: Form tensor [num_classes]. The class is shifting.
  • embed: Form tensor [batch_size, dim].
  • train_labels: Form tensor [batch_size, 1]. Target Classes T.
  • num_sampled: int. The number of classes for random sampling per batch. numbed classes in S.
  • vocabulary_size: the number of possible classes.
  • sampled_values: default is log_uniform_candidate_sampler

For one batch, the target samples are simply train_labels (T). It selects num_sampled samples from randomly embed (S) as negative samples.

It will evenly choose from embed regarding softmax_wiehgt and softmax_bias. Since embed is an attachment of [train_dataset] (of the form [batch_size, embedding_size]), if embeddings [train_dataset [i]] contains train_labels [i], it can be selected back, then this is not a negative label.

According to Sample Candidate on page 2, there are different types. For NCE and a negative sample NEG = S, which may contain part T; for selective logistics, selective softmax, NEG = ST explicitly removes T.

In fact, it could be a random selection from train_.

+1
source share

All Articles