How do content discovery engines like Zemanta and Open Calais work?

I was wondering how semantic services, such as Open Calais, give out the names of companies or people, technical concepts, keywords, etc. from a piece of text. Is it because they have a large database with which they match the text?

How does a service, for example, Zemanta, know which images can offer part of the text, for example?

+5
source share
3 answers

Michal Finkelstein of OpenCalais is here.

-, . , OpenCalais; , , : http://opencalais.com/tagging-information http://opencalais.com/how-does-calais-learn Twitter (@OpenCalais) team@opencalais.com

:

OpenCalais .

" NLP" ( ): , POS .

, (a.k.a. Entity Extraction, Named Entity Recognition). , , /. // .., .

(, , ) , / , , " " "- ", , , . , - , .

/ , ; , M & As ( ), ( ) .. , / , . , .

, , , .

,

+9

, , . , , , , , , , .

OpenNLP - , . , , , Named Entity Recognizers (NER) (, , , ) / (WSD) (: "" ) , . : ", ", " ", " ", , )

, , , NER - , NER , , ( , , ) (, : , , ), ( " ". , ) ( POS) WSD.

python () OpenNLP, NLTK (http://nltk.sourceforge.net), . Java #, .

, , , (, , , , ). , , . , , - NER (tokenize → detect detect → POS tag → WSD → NER), .

+7

Open Calais, , , , , , .. , - .

Zementa probably does something similar, but matches phrases against metadata attached to images to get related results.

This, of course, is not easy.

0
source

All Articles