I have a text file with a tag and tweets.
positive,I love this car negative,I hate this book positive,Good product.
I need to convert each row to a vector value. If I use the seq2sparse command, then the whole document is converted to a vector, but I need to convert each line as a vector not the whole document. ex: key: positive value: vectorvalue (tweet) How can we achieve this in mahout?
/ * That's what I did * /
StringTokenizer str= new StringTokenizer(line,","); String label=str.nextToken(); while (str.hasMoreTokens()) { tweetline =str.nextToken(); System.out.println("Tweetline"+tweetline); StringTokenizer words = new StringTokenizer(tweetline," "); while(words.hasMoreTokens()){ featureList.add(words.nextToken());} } Vector unclassifiedInstanceVector = new RandomAccessSparseVector(tweetline.split(" ").length); FeatureVectorEncoder vectorEncoder = new AdaptiveWordValueEncoder(label); vectorEncoder.setProbes(1); System.out.println("Feature List: "+featureList); for (Object feature: featureList) { vectorEncoder.addToVector((String) feature, unclassifiedInstanceVector); } context.write(new Text("/"+label), new VectorWritable(unclassifiedInstanceVector));
Thank you in advance
source share