Is it possible to simulate cosine similarity in Solr / Lucene?

I'm interested in possible ways to simulate the cosine similarity algorithm using Solr. I have elements that are assigned a vector, for example:

items = [
  { id: 1, vector: [0,0,0,2,3,0,0] },
  { id: 2, vector: [0,1,0,1,5,0,0] },
  { id: 3, vector: [2,3,0,0,0,1,0] },
  { id: 4, vector: [1,2,4,6,5,0,0] }
]

And the search vector to which others should be ranked.

I am currently modeling this in ruby, running through all the elements and assigning them a rank against the input vector. Here's the implementation of the cosine similarity I'm using:

module SimilarityCalculator

  def self.get_similarity(vector1, vector2)
    dp = dot_product(vector1, vector2)
    nm = normalize(vector1) * normalize(vector2)
    dp / nm
  end

  private

  def self.dot_product(vector1, vector2)
    sum = 0.0
    vector1.each_with_index { |val, i| sum += val * vector2[i] }
    sum
  end

  def self.normalize(vector)
    Math.sqrt(vector.inject(0.0) { |m,o| m += o**2 })
  end

end

Then, to get a ranked list, I would do something like the following:

ranked = []
search_vector = [1,0,0,3,5,0,0]
items.each do |item|
  rank = SimilarityCalculator.get_similarity(search_vector, item.vector)
  { id: item.id, rank: rank }
end

I don’t know enough about Solr to find out how it will be modeled or even if it is possible, but I thought I would throw it there.

+5
source share
1 answer

Lucene , : Lucene? .., , ?

, , , "". , .

, , - : "dim_n" ( ), . :

[1,2,0,1] ==> "dim_1 dim_2 dim_2 dim_4"

, .

(, Lucene ?), .

+1

All Articles