How to create a word pack using Weka?

Question

How to create a word pack using Weka?

I have a set of documents, and I want to represent each document as a vector. In principle, the vector will have 1 for words that are present inside the document, and for other words (which are present in other documents in the corpus, and not in this particular document) it will have 0. How to create this vector for all documents in Weka?

Is there a quick way to do this with Weka? I also want Weka to remove stop words and therefore some preprocessing, if possible, before he creates this vector.

Thanks Abhishek S

+5

nlp weka

London guy Oct 10 '11 at 7:26

source share

1 answer

michaeltwofish · Accepted Answer · 2011-10-11T05:09:20+0000

You need a StringToWordVector filter .

, , , , , , .

How to create a word pack using Weka?

More articles: