Reconstruction of the now famous 17-year Markov information search algorithm based on Markov "Apodora"

Question

Reconstruction of the now famous 17-year Markov information search algorithm based on Markov "Apodora"

While we all twisted our thumbs, a 17-year-old Canadian boy apparently found an information retrieval algorithm that:

a) performs with a double precision current and a widely used model of vector space

b) is “fairly accurate” when identifying similar words.

c) makes a more accurate search for the microscope

Here is a good interview .

Unfortunately, there is no published article that I can find yet, but from the memories that I remember from the graphic models and machine learning classes that I took several years ago, I think that we should be able to restore it from an abstract paragraph, and what he says about this in an interview.

From the interview:

Some searches look for words that appear in similar contexts. This is pretty good, but it follows the relationship to the first degree. My algorithm is trying to follow the connections further. connections that are close are considered more valuable. Theoretically, this follows a compound with an infinite degree.

And abstract puts it in context:

A new information retrieval algorithm called “Apodora” has been introduced, using the extreme degrees of Markov chain matrices to determine models for documents and drawing up contextual statistical conclusions about the semantics of words. The system is implemented and compared to a vector space model. Especially if the request is short, the new algorithm gives results with approximately doubled accuracy and has interesting applications for the microscope.

I feel that anyone who knows about Markov chains or the search for information will immediately be able to understand what he is doing.

So: what is he doing?

+7

machine-learning nlp information-retrieval markov-chains

silverasm Aug 6 '11 at 15:24

source share

1 answer

Aengus · Accepted Answer · 2011-08-08T21:06:33+0000

From the use of words such as “context” and the fact that he presented a level of statistical dependence of the second order, I suspect that he is doing something related to the LDA-HMM method described in the article: Griffiths, T., Shteivers, M ., Blei, D. and Tenenbaum, J. (2005). The integration of themes and syntax. Advances in neural information processing systems. There are some inherent limits of search resolution due to model averaging. Nevertheless, I envy doing such things at the age of 17, and I hope that he will do something independent and at least gradually get better. Even another direction on the same topic would be pretty cool.

Reconstruction of the now famous 17-year Markov information search algorithm based on Markov "Apodora"

More articles: