Why is this Django (1.6) comment counter so slow?

Summary. I get very slow queries using multiple queries and annotate against two queries for each element when counting related objects. Database - PostgreSQL 9.3.5.


I have a model that looks something like this:

class Collection(models.Model): have = models.ManyToManyField(Item, related_name='item_have', through='Have') want = models.ManyToManyField(Item, related_name='item_want', through='Want') added = models.DateTimeField() class Meta: ordering = ['-last_bump'] class Have(models.Model): item = models.ForeignKey(Item) collection = models.ForeignKey(Collection, related_name='have_set') price = models.IntegerField(default=0) class Want(models.Model): want = models.ForeignKey(Item) collection = models.ForeignKey(Collection, related_name='want_set') price = models.IntegerField(default=0) 

And, in my opinion, I list these Collections, and I want to show the number of desires and opportunities in each of them, doing this, making an annotation:

 class ListView(generic.ListView): model = Collection queryset = Collection.objects.select_related() paginate_by = 20 def get_queryset(self): queryset = super(ListView, self).get_queryset() queryset = queryset.annotate(have_count=Count("have", distinct=True), want_count=Count("want", distinct=True)) 

This, however, makes my request very slow! I have about 650 records in DB and django-debug-toolbar says that it makes 2 queries and averages about 400-500 ms. I tried with prefetch_related, but that does not make it faster.

I tried another, in the Collection model, I added this:

 @property def have_count(self): return self.have.count() @property def want_count(self): return self.want.count() 

and removed the annotation from my view. Instead, it makes a total of 42 database queries, but this is done in 20-25 ms.

What am I doing wrong with my annotation here? Shouldn't it be faster to perform counting in one query, and also to make many counting queries?

+6
source share
1 answer

Why is it slow : if you just used the annotation with two ManyToMany fields, you create an unwanted large join of all these tables together. The size of the Cartesian product of the strings to be estimated is approximately Have.objects.count() * Want.objects.count() . Then you wrote the words distinct=True to finally limit the number of duplicate elements so as not to get an unacceptable huge result.

Correction for the old Django: if you use only queryset.annotate(have_count=Count("have")) you will get the correct result quickly without distinct=True or the same result is also fast with different. You can then combine the results of two Python queries into memory.


Solution A good solution is possible in Django> = 1.11 (two years after your question), if you use a query with two subqueries , one for Have and one for Want , all for one query, but not to mix all the tables together.

 from django.db.models import Count, OuterRef, Subquery sq = Collection.objects.filter(pk=OuterRef('pk')).order_by() have_count_subq = sq.values('have').annotate(have_count=Count('have')).values('have_count') want_count_subq = sq.values('want').annotate(have_count=Count('want')).values('want_count') queryset = queryset.annotate(have_count=Subquery(have_count_subq), want_count=Subquery(want_count_subq)) 

Validation : You can validate both slow and fixed SQL queries by typing str(my_queryset.query) as described above.

0
source

Source: https://habr.com/ru/post/1215866/


All Articles