Liblinear how to use it

Question

Liblinear how to use it

I am new to machine learning and text development in general. It caught my attention the presence of a ruby library called Liblinear https://github.com/tomz/liblinear-ruby-swig .

What I want to do so far is to prepare software to determine if the text mentions anything related to bicycles or not.

Can someone please highlight the steps that I have to follow (for example: preprocess the text and how), share resources and ideally share a simple example to make me move.

Any help will help, thanks!

+4

ruby machine-learning classification text-mining

mabounassif May 24, '11 at 20:49

source share

1 answer

Fred foo · Accepted Answer · 2011-05-24T21:01:06+0000

Classic approach:

Collect a representative sample of input texts, each of which is designated as interconnected / unrelated.
Divide the sample into training and test sets.
Extract all terms in all documents in the training set; call it vocabulary, V.
For each document in the training set, convert it to a vector of Boolean elements, where the i-th element is true / 1 if the i-th term in the dictionary contains in the document.
Submission of a vectorized set of training in the learning algorithm.

Now, to classify a document, vectorize it in the same way as in step 4. and pass it to the classifier to get a related / unrelated label for it. Compare this with the actual label to make sure it did the right thing. This simple method allows you to get at least about 80% accuracy.

To improve this method, replace booleans with the term number normalized to the length of the document or, even better, tf-idf .

Liblinear how to use it

More articles: