How to find the closest word for a vector using word2vec

Question

How to find the closest word for a vector using word2vec

I just started using Word2vec, and I was wondering how we can find the closest word for the vector. I have this vector, which is the middle vector for a set of vectors:

array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)

Is there a direct way to find the most similar word in my training data for this vector?

Or the only solution is to calculate the cosine similarity between this vector and the vectors of each word in my training data, and then choose the closest one?

Thanks.

+8

python data-analysis text-mining word2vec

sel Sep 24 '15 at 11:03

source share

2 answers

Remember to add an empty array with negative words in the most_similar function:

 import numpy as np model_word_vector = np.array( my_vector, dtype='f') topn = 20; most_similar_words = model.most_similar( [ model_word_vector ], [], topn)

+4

Andrew Krizhanovsky Mar 16 '16 at 15:28

source share

Nicolas Ivanov · Accepted Answer · 2015-11-10T11:36:19+0000

For gensim, the implementation of word2vec has a function most_similar() that allows you to find words semantically close to a given word:

 >>> model.most_similar(positive=['woman', 'king'], negative=['man']) [('queen', 0.50882536), ...]

or its vector representation:

 >>> your_word_vector = array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32) >>> model.most_similar(positive=[your_word_vector], topn=1))

where topn determines the desired number of returned results.

However, my gut feeling is that the function does the same as you suggested, i.e. calculates cosine similarity for a given vector and each other in the dictionary (which is quite inefficient ...)

How to find the closest word for a vector using word2vec

More articles: