Is it possible?
You already answered yourself: yes. In addition to the word2veckeras that gensim uses, there is another CBOW implementation that has no additional dependencies (just in case, I am not related to this repo). You can use them as examples.
How can I fit the model?
Since training data is a large corpus of sentences, the most convenient method is model.fit_generator , which “corresponds to the data model” generated in a batch using the Python generator. ”The generator works indefinitely, receiving CBOW (or SG) tuples (word, context, target) ), but you manually specify sample_per_epoch and nb_epoch to limit training so you nb_epoch off sentence analysis (tokenization, word index table, sliding window, etc.) and the actual keras model, plus save a lot of resources .
Should I use the custom loss function?
CBOW minimizes the distance between the predicted and true distribution of the central word, so in the simplest form categorical_crossentropy does it. If you implement a negative selection that is a bit more complicated but much more efficient, the loss function changes to binary_crossentropy . User loss function is not needed.
For anyone interested in the details of a mathematical and probabilistic model, I highly recommend Stanford's CS224D class. Here are lecture notes about word2vec, CBOW and Skip-Gram.
Another useful link: implementation of word2vec in pure numpy and c .
Maxim
source share