Vocabulary function

I am learning about embedding input for a convolution neural network, and I understand Word2vec. However, the classification of the text is CNN . dennybritz used the learn.preprocessing.VocabularyProcessor function. In the document. They said that these are documents in Google, to sequences of identifiers of words. I'm not quite sure how this feature works. Does it create a list of identifiers, then match identifiers with words, or does it have a dictionary of words and their identifiers when the start function gives only identifiers?

+6
source share
1 answer

Suppose you have only two documents, I like pizza and I like Pasta . Your entire vocabulary consists of these words (I, like, pizza, pasta) For each word in the dictionary there is an index similar to this (1, 2, 3, 4). Now, given a document like I like Pasta , it can be converted to a vector [1, 2, 4]. This is what learn.preprocessing.VocabularyProcessor does. The max_document_length parameter ensures that all documents are represented by the max_document_length length max_document_length either by filling in the numbers if their length is shorter than max_document_length and cutting them off if their length is longer than max_document_length I hope this helps you

+15
source

All Articles