Classification using lingpipe

As part of my academic research project, I am trying to create an application in which I will have a set of URLs received from the Internet. A task classifies each of these URLs into a category.

For an instance, the following URL refers to cricket http://www.espncricinfo.com/icc_cricket_worldcup2011/content/current/story/499851.html If I give this specific URL to the classifier, it should provide the output category as β€œSport”.

For this, I use the lingpipe classifier. I followed the classification tutorial and ran the demo in a demo folder. I have downloaded 20 datasets, downloaded from the following link. http://people.csail.mit.edu/people/jrennie/20Newsgroups

Later, I reduced the sample size from 20 to 8 and skipped the demo. He could successfully train the data and could also test the data.

But the fact is that I need to train a classifier every time to check the category of documents? If I run document classification, it takes 4 minutes to train and test the data.

Can I save the prepared data once and perform the classification several times?

+4
source share
1 answer

You need to serialize trained models to disk, and then you can deserialize them and prepare a classifier.

After you use the learner using the classifier

AbstractExternalizable.compileTo(classifier,modelFile); 

To burn a model to disk.

For reading you will need

 AbstractExternalizable.readObject(modelFile); 

Take a look at the Java document for AbstractExternalizable .

The model will not be able to accept additional training activities because it has been compiled.

+4
source

All Articles