Keyword Matching Algorithms

Suppose we have buyers and sellers who are trying to find each other in the market. Buyers can tag their needs with keywords; sellers can do the same for what they sell. I am interested in finding the algorithm (s) that sells sellers of rank, in terms of their relevance to a particular customer based on their two sets of keywords.

Here is an example:

buyer_keywords = {"furry", "four legs", "likes catnip", "has claws"} 

and then we have two potential sellers who need to rank the order in terms of their relevance:

seller_keywords[1] = {"furry", "four legs", "arctic circle", "white"}
seller_keywords[2] = {"likes catnip", "furry", 
                      "hates mice", "yarn-lover", "whiskers"}

If we just use keyword intersection, we don’t get much discrimination: both intersect on 2 keywords. If we divide the intersection score by the size of the combined set, seller 2 will really degrade due to more keywords. It would seem that this introduces an automatic punishment for any method that does not adjust the size of the set of keywords (and we definitely do not want to punish the addition of keywords).

To add a slightly more complex structure to the problem, suppose we have some true measure of the intensity of the attributes of keywords (which should be summed up to 1 for each seller), for example:

seller_keywords[1] = {"furry":.05, 
                      "four legs":.05, 
                      "arctic circle":.8, 
                      "white":.1}

seller_keywords[2] = {"likes catnip":.5, 
                      "furry":.4, 
                      "hates mice":.02, 
                      "yarn-lover":.02, 
                      "whiskers":.06}

: 1 0,1, 2 0,9. , , :

seller_keywords[3] = {"furry":1}

, .

, ( ) , . , CS101, , .

+5
2

, ; , , . , , , :

terms[0] --> aardvark
terms[1] --> anteater
...
terms[N] --> zuckerberg

:

person1[0] = 0     # this person doesn't care about aardvarks
person1[1] = 0.05  # this person cares a bit about anteaters
...
person1[N] = 0

N- . . , , . , 1, , - .

, tf-idf . Tf-idf (, "iPhone" ) , .

tf-idf .

+7

, , . .

, , : Drupal .

, . , . , , , . ; .

0

All Articles