Django many-to-many field: prefetch primary keys only

I am trying to optimize database queries for a Django application. Here's a simplified example:

class Label(models.Model): name = models.CharField(max_length=200) # ... many other fields ... class Thing(models.Model): name = models.CharField(max_length=200) labels = models.ManyToManyField(Label) 

I have a function that retrieves all Label and Thing and puts them in a JSON data structure in which Thing refers to Label using their id (primary keys). Something like that:

 { 'labels': [ { 'id': 123, 'name': 'label foo' }, ... ], 'things': [ { 'id': 45, 'name': 'thing bar', 'labels': [ 123, ... ] }, ... ] } 

What is the most efficient way to get such a data structure using Django? Suppose I have L Label and T Thing s, and the middle Thing has x Label s.

Method 1:

 data = {} data['labels'] = [model_to_dict(label) for label in Label.objects.all()] data['things'] = [model_to_dict(thing) for thing in Thing.objects.all()] 

This makes database queries (1 + 1 + T ), since model_to_dict(thing) needs to select a Label for each Thing individually.

Method 2:

 data = {} data['labels'] = [model_to_dict(label) for label in Label.objects.all()] data['things'] = [model_to_dict(thing) for thing in Thing.objects.prefetch_related('labels').all()] 

This only queries the database (1 + 1 + 1), since the Thing selection now has its own Label , preselected in one additional query.

This is still not satisfactory. prefetch_related('labels') will retrieve many copies of the same Label , whereas I only need their id s. Is there a way to prefetch id only Label ? I tried prefetch_related('labels__id') , but that didn't work. I'm also worried that since T is large (hundreds), prefetch_related('labels') leads to an SQL query with a big IN clause. L is much smaller (<10), so I could do this instead:

Method 3:

 data = {} data['labels'] = [model_to_dict(label) for label in Label.objects.prefetch_related('thing_set').all()] things = list(Thing.objects.all()) # plug in label ids by hand, and also fetch things that have zero labels # somehow 

This leads to a smaller IN clause, but still unsatisfactory because prefetch_related('thing_set') selects a duplicate of Thing s if a Thing has multiple Label s.

Summary:

Label and Thing are connected by a ManyToManyField . I still get all Label and Thing . So, how can I effectively use their many-to-many relationships?

+7
source share
1 answer

I understood. Thanks to ilvar, whose comment on this question pointed me to through tables .

If you did not specify an explicit pass-through model, implicit through the model class, which you can use to directly access the table created for the association. It has three fields for the model.

In short:

 # Fetch all labels and things: labels = list(Label.objects.all()) things = list(Thing.objects.all()) # Fetch all label-thing pairs: labels_of = defaultdict(lambda: []) for pair in Thing.labels.through.objects.filter(label__in=labels): labels_of[pair.thing_id].append(pair.label_id) # Put everything together: data = {} data['labels'] = [model_to_dict(label) for label in labels] data['things'] = [] for thing in things: thing_dict = model_to_dict(thing, exclude='labels') thing_dict['labels'] = labels_of[thing.id] data['things'].append(thing_dict) 

This makes queries (1 + 1 + 1) and does not retrieve anything. I can also change the first for loop to:

 for pair in Thing.labels.through.objects.filter(thing__in=things): 

if I have more Label than Thing s, which will result in a query with a smaller IN clause.

The django-debug-toolbar debugsqlshell control command is excellent for actually viewing the requests that part of the code does.

+7
source

All Articles