How do content discovery engines like Zemanta and Open Calais work?

Question

How do content discovery engines like Zemanta and Open Calais work?

I was wondering how semantic services, such as Open Calais, give out the names of companies or people, technical concepts, keywords, etc. from a piece of text. Is it because they have a large database with which they match the text?

How does a service, for example, Zemanta, know which images can offer part of the text, for example?

+5

python ruby semantics zemanta

Marco Aug 22 '08 at 10:51

source share

3 answers

Michal Finkelstein · Answer 1 · 2009-05-04T19:45:20+0000

Michal Finkelstein of OpenCalais is here.

-, . , OpenCalais; , , : http://opencalais.com/tagging-information http://opencalais.com/how-does-calais-learn Twitter (@OpenCalais) team@opencalais.com

:

OpenCalais .

" NLP" ( ): , POS .

, (a.k.a. Entity Extraction, Named Entity Recognition). , , /. // .., .

(, , ) , / , , " " "- ", , , . , - , .

/ , ; , M & As ( ), ( ) .. , / , . , .

, , , .

,

rcreswick · Answer 2 · 2008-08-30T03:56:57+0000

, , . , , , , , , , .

OpenNLP - , . , , , Named Entity Recognizers (NER) (, , , ) / (WSD) (: "" ) , . : ", ", " ", " ", , )

, , , NER - , NER , , ( , , ) (, : , , ), ( " ". , ) ( POS) WSD.

python () OpenNLP, NLTK (http://nltk.sourceforge.net), . Java #, .

, , , (, , , , ). , , . , , - NER (tokenize → detect detect → POS tag → WSD → NER), .

Endangeredmassa · Answer 3 · 2008-08-22T17:58:23+0000

Open Calais, , , , , , .. , - .

Zementa probably does something similar, but matches phrases against metadata attached to images to get related results.

This, of course, is not easy.

How do content discovery engines like Zemanta and Open Calais work?

More articles: