How to create a word pack using Weka?

I have a set of documents, and I want to represent each document as a vector. In principle, the vector will have 1 for words that are present inside the document, and for other words (which are present in other documents in the corpus, and not in this particular document) it will have 0. How to create this vector for all documents in Weka?

Is there a quick way to do this with Weka? I also want Weka to remove stop words and therefore some preprocessing, if possible, before he creates this vector.

Thanks Abhishek S

+5
source share
1 answer

You need a StringToWordVector filter .

, , , , , , .

+7

All Articles