Suggestions for the People Similarity Algorithm

I want to get some suggestions for my "find similar people" algorithm :). I have one database where the following objects are stored: Person, article, keywords. Therefore, for each person I have a set of keywords (with the number of references to a person) that were compiled from the keywords of the articles. Therefore, I need to get similar people by looking at their relevant keywords, a simple solution would be to get x keywords on behalf of y and find all people who have similar keyword ratings (not equal), but it seems that this is not the best way . Thoughts?

Thanks!

+4
source share
1 answer

It looks like your case is close enough to the usual search engine similarity queries, that you can use the same vector space model.

For each person, count the number of occurrences of each keyword. Consider each keyword as a dimension and the number of occurrences as the magnitude of the vector in this dimension. Usually each dimension is handled the same way, but if you find that some keywords are the best predictors of compatibility, you can to some extent scale each occurrence in that dimension.

Then the dot product of the vectors of different people gives you an estimate of how similar they are. Or you can enter your own keywords and find people whose interests are closest.

+6
source

All Articles