Does Word2Vec have a hidden layer?

When I read one of the articles by Thomas Mikolov: http://arxiv.org/pdf/1301.3781.pdf

I have one problem in the Continuous Bag of Words section:

The first proposed architecture is similar to direct link NNLM, where the non-linear hidden layer is removed and the projection layer is common to all words (and not just projection matrices) thus, all words are projected into the same position (their vectors are averaged).

I find that some people point out that there is a hidden level in the Word2Vec model, but in my opinion there is only one projection level in this model. Does this projection layer support the same work as the hidden layer?

Another question: how to design the input data in the projection layer?

"the projection layer is common to all words (and not just the projection matrix)", what does this mean?

+7
neural-network word2vec
source share
1 answer

From the original article , section 3.1, it is clear that there is no hidden layer:

"The first proposed architecture is similar to the original NNLM where the non-linear hidden layer is removed and the projection layer is used for all words."

As for your second question (which means sharing a projection layer), it means that you are considering only one single vector, which is the centroid of the vectors of all words in context. Thus, instead of entering the word vectors n-1 as input, you take into account only one vector. That's why it is called Continuous Bag of Words (because the word order is lost in the context of n-1 size).

+1
source share

All Articles