This can be done by inheriting from the SimilarityABC interface. I did not find any documentation for this, but it looks like it was done earlier to determine the similarity with distance for Word Mover . Here is a general way to do this. You can probably make it more effective by specializing in the measure of similarity that you care about.
import numpy from gensim import interfaces class CustomSimilarity(interfaces.SimilarityABC): def __init__(self, corpus, custom_similarity, num_best=None, chunksize=256): self.corpus = corpus self.custom_similarity = custom_similarity self.num_best = num_best self.chunksize = chunksize self.normalize = False def get_similarities(self, query): """ **Do not use this function directly; use the self[query] syntax instead.** """ if isinstance(query, numpy.ndarray):
To implement custom affinity:
def overlap_sim(doc1, doc2): # similarity defined by the number of common words return len(set(doc1) & set(doc2)) corpus = [['cat', 'dog'], ['cat', 'bird'], ['dog']] cs = CustomSimilarity(corpus, overlap_sim, num_best=2) print(cs[['bird', 'cat', 'frog']])
Outputs [(1, 2.0), (0, 1.0)] .
xeqql source share