For gensim, the implementation of word2vec has a function most_similar() that allows you to find words semantically close to a given word:
>>> model.most_similar(positive=['woman', 'king'], negative=['man']) [('queen', 0.50882536), ...]
or its vector representation:
>>> your_word_vector = array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32) >>> model.most_similar(positive=[your_word_vector], topn=1))
where topn determines the desired number of returned results.
However, my gut feeling is that the function does the same as you suggested, i.e. calculates cosine similarity for a given vector and each other in the dictionary (which is quite inefficient ...)
Nicolas Ivanov
source share