Django: QuerySet order based on latest child models

Suppose I want to show a list of runners ordered by their last sprint date.

class Runner(models.Model): name = models.CharField(max_length=255) class Sprint(models.Model): runner = models.ForeignKey(Runner) time = models.PositiveIntegerField() created = models.DateTimeField(auto_now_add=True) 

This is a quick sketch of what I will do in SQL:

 SELECT runner.id, runner.name, sprint.time FROM runner LEFT JOIN sprint ON (sprint.runner_id = runner.id) WHERE sprint.id = ( SELECT sprint_inner.id FROM sprint as sprint_inner WHERE sprint_inner.runner_id = runner.id ORDER BY sprint_inner.created DESC LIMIT 1 ) OR sprint.id = NULL ORDER BY sprint.time ASC 

The Django QuerySet documentation says:

It is allowed to set a multivalued field for organizing the results (for example, the ManyToManyField field). Usually this will not be a reasonable thing and its truly advanced use function. However, if you know that your filtering requests or available data implies that there will be only one custom piece of data for each of the main elements that you select, the order may be exactly what you want to do. Use multiple-field ordering with caution and make sure the results are expected.

I think I need to apply some filter here, but I'm not sure what exactly Django expects ...

One note, as this is not obvious in this example: the Runner table will contain several hundred records, sprints will also have several hundred, and on some subsequent days, probably several thousand records. The data will be displayed paginated, so sorting in Python is not an option.

The only other opportunity I see is to write SQL myself, but I would like to avoid this at all costs.

+7
source share
2 answers

I don’t think there is a way to do this through ORM with just one request, you can grab a list of runners and use annotate to add your last sprint identifier, then filter and order these sprints.

 >>> from django.db.models import Max # all runners now have a `last_race` attribute, # which is the `id` of the last sprint they ran >>> runners = Runner.objects.annotate(last_race=Max("sprint__id")) # a list of each runner last sprint ordered by the the sprint time, # we use `select_related` to limit lookup queries later on >>> results = Sprint.objects.filter(id__in=[runner.last_race for runner in runners]) ... .order_by("time") ... .select_related("runner") # grab the first result >>> first_result = results[0] # you can access the runner details via `.runner`, eg `first_result.runner.name` >>> isinstance(first_result.runner, Runner) True # this should only ever execute 2 queries, no matter what you do with the results >>> from django.db import connection >>> len(connection.queries) 2 

This is pretty fast and will still use database indexes and caching.

A few thousand records are not so many, this should work well for such numbers. If you run into problems, I suggest you bite the bullet and use raw SQL.

+2
source
 def view_name(request): spr = Sprint.objects.values('runner', flat=True).order_by(-created).distinct() runners = [] for s in spr: latest_sprint = Sprint.objects.filter(runner=s.runner).order_by(-created)[:1] for latest in latest_sprint: runners.append({'runner': s.runner, 'time': latest.time}) return render(request, 'page.html', { 'runners': runners, }) {% for runner in runners %} {{runner.runner}} - {{runner.time}} {% endfor %} 
0
source

All Articles