The sample of the candidate explains how the loss discretization function is calculated:
- Calculate the loss function in the subset C of all training samples L, where C = T ⋃ S, T are samples in the target classes, and S are random samples in all classes.
The code you use uses tf.nn.embedding_lookup to get the inputs [batch_size, dim] embed .
Then it uses tf.nn.sampled_softmax_loss to get the loss discretization function:
- softmax_weights: form tensor [num_classes, dim].
- softmax_biases: Form tensor [num_classes]. The class is shifting.
- embed: Form tensor [batch_size, dim].
- train_labels: Form tensor [batch_size, 1]. Target Classes T.
- num_sampled: int. The number of classes for random sampling per batch. numbed classes in S.
- vocabulary_size: the number of possible classes.
- sampled_values: default is log_uniform_candidate_sampler
For one batch, the target samples are simply train_labels (T). It selects num_sampled samples from randomly embed (S) as negative samples.
It will evenly choose from embed regarding softmax_wiehgt and softmax_bias. Since embed is an attachment of [train_dataset] (of the form [batch_size, embedding_size]), if embeddings [train_dataset [i]] contains train_labels [i], it can be selected back, then this is not a negative label.
According to Sample Candidate on page 2, there are different types. For NCE and a negative sample NEG = S, which may contain part T; for selective logistics, selective softmax, NEG = ST explicitly removes T.
In fact, it could be a random selection from train_.
Q.Liu
source share