What does the tf.nn.embedding_lookup function do?

tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None) 

I can not understand the responsibilities of this function. Does this look like a lookup table? What does it mean to return the parameters corresponding to each id (in ids)?

For example, in the skip-gram model, if we use tf.nn.embedding_lookup(embeddings, train_inputs) , then for each train_input it find the corresponding attachment?

+71
python tensorflow
Jan 19 '16 at 7:14
source share
5 answers
Function

embedding_lookup retrieves the params tensor rows. The behavior is similar to using indexing with arrays in numpy. For example.

 matrix = np.random.random([1024, 64]) # 64-dimensional embeddings ids = np.array([0, 5, 17, 33]) print matrix[ids] # prints a matrix of shape [4, 64] 
Argument

params can also be a list of tensors, in which case ids will be distributed among the tensors. For example, given a list of 3 tensors [2, 64] , the default behavior is that they will represent ids : [0, 3] , [1, 4] , [2, 5] .

partition_strategy defines how ids distributed among the list. Separation is useful for larger problems where the matrix may be too large to hold in one piece.

+90
Jan 19 '16 at 13:05
source share

Yes, this feature is hard to understand until you get the point.

In its simplest form, it looks like tf.gather . It returns params elements according to the indices specified by ids .

For example (if you are inside tf.InteractiveSession() )

 params = tf.constant([10,20,30,40]) ids = tf.constant([0,1,2,3]) print tf.nn.embedding_lookup(params,ids).eval() 

will return [10 20 30 40] , because the first element (index 0) of parameters 10 , the second params element (index 1) is 20 , etc.

Similarly

 params = tf.constant([10,20,30,40]) ids = tf.constant([1,1,3]) print tf.nn.embedding_lookup(params,ids).eval() 

will return [20 20 40] .

But embedding_lookup more. The params argument can be a list of tensors, not just one tensor.

 params1 = tf.constant([1,2]) params2 = tf.constant([10,20]) ids = tf.constant([2,0,2,1,2,3]) result = tf.nn.embedding_lookup([params1, params2], ids) 

In this case, the indices indicated in ids correspond to tensor elements according to the partition strategy, where the default partition strategy is “mod”.

In the “mod” strategy, index 0 corresponds to the first element of the first tensor in the list. Index 1 corresponds to the first element of the second tensor. Index 2 corresponds to the first element of the tensor third , etc. Just the index i corresponds to the first element of the tensor (i + 1) th for all indices 0..(n-1) , counting params as a list of tensors n .

Now the index n cannot correspond to the tensor n + 1, since the list of params contains only the tensors n . Thus, the index n corresponds to the second element of the first tensor. Similarly, the index n+1 corresponds to the second element of the second tensor, etc.

So in the code

 params1 = tf.constant([1,2]) params2 = tf.constant([10,20]) ids = tf.constant([2,0,2,1,2,3]) result = tf.nn.embedding_lookup([params1, params2], ids) 

index 0 corresponds to the first element of the first tensor: 1

index 1 corresponds to the first element of the second tensor: 10

index 2 corresponds to the second element of the first tensor: 2

index 3 corresponds to the second element of the second tensor: 20

Thus, the result will be:

 [ 2 1 2 10 2 20] 
+119
Jan 29 '17 at 16:03
source share

Another way to look at this is to assume that you are smoothing tensors into one dimensional array and then doing a search

(for example, Tensor0 = [1,2,3], Tensor1 = [4,5,6], Tensor2 = [7,8,9]

The smoothed tensor will be as follows: [1,4,7,2,5,8,3,6,9]

Now, when you browse [0,3,4,1,7], he will play [1,2,5,4,6]

(i, e), if the search value is 7, and we have 3 tensors (or a tensor with 3 rows), then

7/3: (Reminder 1, Ratio 2). The 2nd element of Tensor1, which is 6, will be shown.

+2
Oct 27 '17 at 0:05
source share

When the parameter tensor is in large sizes, identifiers refer only to the upper dimension. This may be obvious to most people, but I have to run the following code to figure this out:

 embeddings = tf.constant([[[1,1],[2,2],[3,3],[4,4]],[[11,11],[12,12],[13,13],[14,14]], [[21,21],[22,22],[23,23],[24,24]]]) ids=tf.constant([0,2,1]) embed = tf.nn.embedding_lookup(embeddings, ids, partition_strategy='div') with tf.Session() as session: result = session.run(embed) print (result) 

Just by trying the "div" strategy and for one tensor, it does not matter.

Here is the result:

 [[[ 1 1] [ 2 2] [ 3 3] [ 4 4]] [[21 21] [22 22] [23 23] [24 24]] [[11 11] [12 12] [13 13] [14 14]]] 
+1
Nov 25 '17 at 3:15
source share

Adding Asher Stern to the answer, params interpreted as a partition of the large embedding tensor. This can be a single tensor representing the full embedding tensor, or a list of X-tensors of the same form, with the exception of the first dimension, representing tensors of embedded embeddings.

The tf.nn.embedding_lookup function tf.nn.embedding_lookup written considering that the attachment (params) will be large. Therefore, we need partition_strategy .

0
Aug 16 '17 at 23:31 on
source share



All Articles