The best algorithm for indexing offers

Imagine that I have a situation where I need to index offers. Let me explain this a little deeper.

For example, I have these suggestions:

  • Beautiful sky.
  • Beautiful dream of the sky.
  • Beautiful dream.

As far as I can imagine, the index should look something like this:

alt text http://img7.imageshack.us/img7/4029/indexarb.png

But also I would like to search for any of these words.

For example, if I search by "value", it should show that I am connected to "beautiful." if I search for “beautiful,” he should give me a connection with (previous) “The”, (hereinafter) “sky” and “dream”. If I search for “sky”, it should give (previous) connection to “beautiful”, etc ....

Any ideas? Maybe you know an existing algorithm for this kind of problem?

+5
source share
7 answers

This should be close in C #:

class Program
{
    public class Node
    {
        private string _term;
        private Dictionary<string, KeyValuePair<Node, Node>> _related = new Dictionary<string, KeyValuePair<Node, Node>>();

        public Node(string term)
        {
            _term = term;
        }

        public void Add(string phrase, Node previous, string [] phraseRemainder, Dictionary<string,Node> existing)
        {
            Node next= null;
            if (phraseRemainder.Length > 0)
            {
                if (!existing.TryGetValue(phraseRemainder[0], out next))
                {
                    existing[phraseRemainder[0]] = next = new Node(phraseRemainder[0]);
                }
                next.Add(phrase, this, phraseRemainder.Skip(1).ToArray(), existing);
            }
            _related.Add(phrase, new KeyValuePair<Node, Node>(previous, next));

        }
    }


    static void Main(string[] args)
    {
        string [] sentences = 
            new string [] { 
                "The beautiful sky",
                "Beautiful sky dream",
                "beautiful dream"
            };

        Dictionary<string, Node> parsedSentences = new Dictionary<string,Node>();

        foreach(string sentence in sentences)
        {
            string [] words = sentence.ToLowerInvariant().Split(' ');
            Node startNode;
            if (!parsedSentences.TryGetValue(words[0],out startNode))
            {
                parsedSentences[words[0]] = startNode = new Node(words[0]);
            }
            if (words.Length > 1)
                startNode.Add(sentence,null,words.Skip(1).ToArray(),parsedSentences);
        }
    }
}

, . , - , , .

0

/ . structs - .

, , .

  • .
  • .

:

  • .
  • .

SEO, , .

+5

, Inverted index. Hashmap , (sentence_id, position). . :

sentence[0] = ['the','beautiful', 'sky'];
sentence[1] = ['beautiful','sky', 'dream'];
sentence[2] = ['beautiful', 'dream'];

inverted_index = 
{
 'the': {(0,0)},
 'beautiful': {(0,1), (1,0), (2,0)},
 'sky' : {(0,2),(1,1)},
 'dream':{(1,2), (2,1)}
};

. , .

, .

+2

, . (, ), .. , .

, - , .

+1

, :

Words:
    Id     integer primary-key
    Word   varchar(20)
Following:
    WordId1 integer foreign-key Words(Id) indexed
    WordId2 integer foreign-key Words(Id) indexed

, , , , , :

The beautiful sky.
    Words (1,'the')
    Words (2, 'beautiful')
    Words (3,, 'sky')
    Following (1, 2)
    Following (2, 3)
Beautiful sky dream.
    Words (4, 'dream')
    Following (3, 4)
Beautiful dream.
    Following (2, 4)

, .

+1
source

Tree search algorithms (e.g. BST, ect)

-4
source

All Articles