How to extract keywords from a block of text in Haskell

Therefore, I know that this is a kind of big topic, but I need to take a piece of text and extract the most interesting keywords from it. The text comes from television signatures, so the subject can range from news to sports to links to pop culture. You can specify the type of text display.

I have an idea to match the text with a glossary of terms that I know are interesting.

What libraries for Haskell can help me?

Assuming that I have a dictionary of interesting terms and a database for storing them, is there a specific approach that you would recommend matching keywords in the text?

Is there an obvious approach that I don't think about?

+7
source share
2 answers

I would stop the words in pieces, and then look for all the terms in a dict, just two random libraries:

stem http://hackage.haskell.org/packages/archive/stemmer/0.2/doc/html/NLP-Stemmer-C.html

search http://hackage.haskell.org/packages/archive/sphinx/0.2.1/doc/html/Text-Search-Sphinx.html

+2
source

To extend the answer to bpgergo (but I don't have any haskell related information), it is quite simple to enter documents into a relational database and index them using SOLR / lucene or sphinx, any of which should have them by default / proposed configuration. And then you can search on which documents there are pairs, triples, etc. Your list of "interesting terms"

You can take a look at recognition of names in names, statistically unusual phrase detection, automatic tag creation, such topics. Lingpipe is a good place to start, also these books:

http://alias-i.com/lingpipe/demos/tutorial/read-me.html

http://www.manning.com/marmanis/excerpt_contents.html

http://www.manning.com/alag/excerpt_contents.html

+1
source

All Articles