How to implement a packet LSTM network in PyTorch?

This release PyTorch seems to provide PackedSequencefor variable input lengths for a recurrent neural network. However, it was a little difficult for me to use it correctly.

Using pad_packed_sequenceto restore the output of the RNN layer that was submitted pack_padded_sequence, we got a tensor T x B x N outputs, where Tare the maximum time steps, Bis the lot size and Nis the hidden size. I found that for short sequences in a packet, the next output will be all zeros.

Here are my questions.

  • For one inference problem, for which the last conclusion of all sequences is needed, a simple one outputs[-1]will give an incorrect result, since this tensor contains many zeros for short sequences. You need to build indexes along the length of the sequence to get the last last output for all sequences. Is there an easier way to do this?
  • For a multi-pin task (e.g. seq2seq) a line layer is usually added N x Oand converted to batch outputs T x B x Oin TB x Oand the cross-entropy loss is calculated with true goals TB(usually integers in the language model). In this situation, do these zeros in the batch release matter?
+6
source share
1 answer

1 -

, . , . , . last_timestep. .

class BaselineRNN(nn.Module):
    def __init__(self, **kwargs):
        ...

    def last_timestep(self, unpacked, lengths):
        # Index of the last output for each sequence.
        idx = (lengths - 1).view(-1, 1).expand(unpacked.size(0),
                                               unpacked.size(2)).unsqueeze(1)
        return unpacked.gather(1, idx).squeeze()

    def forward(self, x, lengths):
        embs = self.embedding(x)

        # pack the batch
        packed = pack_padded_sequence(embs, list(lengths.data),
                                      batch_first=True)

        out_packed, (h, c) = self.rnn(packed)

        out_unpacked, _ = pad_packed_sequence(out_packed, batch_first=True)

        # get the outputs from the last *non-masked* timestep for each sentence
        last_outputs = self.last_timestep(out_unpacked, lengths)

        # project to the classes using a linear layer
        logits = self.linear(last_outputs)

        return logits

2 -

, (). . , PyTorch.

+3

All Articles