Inline lookup table does not mask fill value

I use the embedding_lookup operation to generate dense vector representations for each token in my document that feed the convolutional neural network (the network architecture is similar to the one in the WildML article ).

Everything works correctly, but when I insert my document by inserting an extra value into it, the embed search also generates a vector for this token. I think this approach can change the results in the classification problem. What I want to achieve is similar to what the torch LookupTableMaskZero does .

1) Right, what do I want to do?

2) Has something like this already been implemented?

3) If not, how can I mask the fill value to prevent the creation of an appropriate vector for it?

Thank you in advance,

Alessandro

+5
source share
2 answers

@Alessandro Suglia I think this feature is useful, unfortunately, it is not supported now. One solution to get the same result, but slower, is to search twice. as below

lookup_result = tf.nn.embedding_lookup(emb, index) masked_emb = tf.concat(0, [tf.zeros([1, 1]), tf.ones([emb.get_shape()[0] - 1, 1]) mask_lookup_result = tf.nn.embedding_lookup(masked_emb, index) lookup_result = tf.mul(lookup_result, mask_lookup_result) 
+2
source

It seems that in the rnn model we don’t need to mask the fill value while we mask the loss (the loss is the same, regardless of whether we mask the input complement, I get the result by running the test code)

Of course, zero padding can speed up the calculation for zero multiplication when the sequence_len parameter in tf.nn.dynamic_rnn not passed.

In the end, if the model will interact between the sequence (for example, CNN, the convolution may affect the attachment of additions), a zero fill investment is necessary.

-1
source

All Articles