Word2vec with elasticsearch for text similarities

Question

Word2vec with elasticsearch for text similarities

I have a large collection of texts where each text grows rapidly. I need to do a similarity search.

The idea is to embed each word as word2vec and present each text as a normalized vector by means of a vector, adding embeddings of each word into it. Subsequent additions to the text will only refine the resulting text vector by adding new vectors to it.

Is it possible to use elastics search to resemble cosines, saving only the coordinates of each normalized text vector in the document? If so, what is the proper index structure for such a search?

+14

elasticsearch word2vec

Alec matusis Feb 23 '17 at 6:45

source share

2 answers

angleto · Answer 1 · 2017-03-05T16:09:12+0000

This elasticsearch plugin implements an evaluation function (point product) for vectors stored using delimited-payload-tokenfilter

The complexity of this search is a linear function of the number of documents, and it is worse than tf-idf in a terminological query, since the ES first searches on an inverted index , it uses tf-idf to evaluate documents , therefore tf-idf is not executed in all documents index. Using the vector you are looking for is the vector space of a document with a lower cosine distance without the benefits of an inverted index.

Alex Moore-Niemi · Answer 2 · 2019-06-12T22:10:44+0000

For Elasticsearch 6.4.x, StaySense made this plugin available .

Word2vec with elasticsearch for text similarities

More articles: