I have a large collection of texts where each text grows rapidly. I need to do a similarity search.
The idea is to embed each word as word2vec and present each text as a normalized vector by means of a vector, adding embeddings of each word into it. Subsequent additions to the text will only refine the resulting text vector by adding new vectors to it.
Is it possible to use elastics search to resemble cosines, saving only the coordinates of each normalized text vector in the document? If so, what is the proper index structure for such a search?
elasticsearch word2vec
Alec matusis
source share