It seems to me that it is easier to explain my problem with an example.
I have one table with ingredients for recipes, and I have implemented a function to calculate the Tanimoto coefficient between the ingredients. It is fast enough to calculate the coefficient between two ingredients (3 sql queries are required), but it does not scale well. To calculate the coefficient between all possible combinations of ingredients, he needs N + (N * (N-1)) / 2 queries or 500,500 queries for just 1,000 ingredients. Is there a faster way to do this? Here is what I got so far:
class Filtering(): def __init__(self): self._connection=sqlite.connect('database.db') def n_recipes(self, ingredient_id): cursor = self._connection.cursor() cursor.execute('''select count(recipe_id) from recipe_ingredient where ingredient_id = ? ''', (ingredient_id, )) return cursor.fetchone()[0] def n_recipes_intersection(self, ingredient_a, ingredient_b): cursor = self._connection.cursor() cursor.execute('''select count(drink_id) from recipe_ingredient where ingredient_id = ? and recipe_id in ( select recipe_id from recipe_ingredient where ingredient_id = ?) ''', (ingredient_a, ingredient_b)) return cursor.fetchone()[0] def tanimoto(self, ingredient_a, ingredient_b): n_a, n_b = map(self.n_recipes, (ingredient_a, ingredient_b)) n_ab = self.n_recipes_intersection(ingredient_a, ingredient_b) return float(n_ab) / (n_a + n_b - n_ab)
source share