For data , I have the same formatted list as yours:
[['aw', 'wb', 'ce', 'uw', 'qqg'], ['g', 'e', 'ent', 'va'], ['a'] .. .]
For shortcuts I have a list: [1, 0, 0 ...] It indicates the class of my sentences above, here you can have any class (tag) (not only 1 or 0)
Since we already have a list as indicated above, we can use a TaggedDocumnet rather than a TaggedLineDocument
model = gensim.models.Doc2Vec(self.myDataFlow(data,labels)) def myDataFlow(self,data,labels): for i, j in zip(data,labels): yield TaggedDocument(i,[j])
source share