Bigram to the vector

I want to build word embeddings for documents using the word2vec tool. I know how to find a vector attachment matching a single word (unigram). Now I want to find a vector for bigram. Can word2vec be used? If so, how?

+4
source share
1 answer

The following snippet will give you a vectorial representation of a bigram. Please note that the bigram you want to convert to a vector must have an underscore instead of a space between words, for example. bigram2vec(unigrams, "this report")wrong, it should be bigram2vec(unigrams, "this_report"). For more information on creating unigrams, see the gensim.models.word2vec.Word2Vecclass here .

from gensim.models import word2vec

def bigram2vec(unigrams, bigram_to_search):
    bigrams = Phrases(unigrams)
    model = word2vec.Word2Vec(bigrams[unigrams])
    if bigram_to_search in model.vocab.keys():
        return model[bigram_to_search]
    else:
        return None
+3

All Articles