I have some example sentences that I want to run through the Doc2Vec model. My ultimate goal is a size matrix (num_sentences, num_features).
I am using the Gensim package.
from gensim.models.doc2vec import TaggedDocument from gensim.models import Doc2Vec
Now I thought model.docvecs would give me a list of arrays with the first array corresponding to the vector for sentence 1, the second array corresponding to the vector for sentence 2, etc. But instead, he got a length of 10!
I get model.docvecs[0] = array([ 0.02312995, -0.00339695, -0.01273827, 0.01944644, -0.03247212, -0.04663946, 0.01369059, 0.03289782, 0.03516903, -0.03435936], dtype=float32)
What are these docvecs ? How to get the desired result, which is the size matrix (40, 10) in this example?
I saw this here , and the correct answer says below: "where 99 is the identifier of the document whose vector we want." So this confuses me even more, as he seems to be saying that model.docvecs SHOULD index the matrix, where each row is a document vector!
source share