QuerySet with .latest () for each day

I have a basic model like:

class Stats(models.Model): created = models.DateTimeField(auto_now_add=True) growth = models.IntegerField() 

I run celery every 10 minutes to create a new statistics object.

Using .latest() in QuerySet gives me the latest Stats object to date.

However, I need a list with one Stats object per day.

Consider the following:

 Stats(growth=100) #created 1/1/13 23:50 Stats(growth=200) #created 1/1/13 23:59 Stats(growth=111) #created 1/2/13 23:50 Stats(growth=222) #created 1/2/13 23:59 

QuerySet should return the latest data for every day. In the example, one with a height of 200 and 222.

In SQL, I ran a subquery with the maximum value for each day and combined it.

Since I don't want to use raw SQL, is there a way to do this with django ORM?

+8
django
source share
4 answers

Unfortunately, there is no way (what I know about .. I looked pretty hard) to avoid using some kind of raw sql to accomplish what you want to do (with your current model; another suggestion) . But you can minimize the impact by writing as little raw sql as possible. In practice, django sites should not be portable across different databases. If you do not plan to use this application elsewhere or publicly publish it, you should be fine.

Below is an example for sqlite. You can save the mapping of database types to date functions, see the type of driver and replace it with the correct one if you need to.

 >>> for stat in Stats.objects.all(): ... print stat.created, stat.growth ... 2013-06-22 13:41:25.334262+00:00 3 2013-06-22 13:41:40.473373+00:00 3 2013-06-22 13:41:44.921247+00:00 4 2013-06-22 13:41:47.533102+00:00 5 2013-06-23 13:41:58.458250+00:00 6 2013-06-23 13:42:01.282702+00:00 3 2013-06-23 13:42:03.633236+00:00 1 >>> last_stat_per_day = Stats.objects.extra( select={'the_date': 'date(created)' } ).values_list('the_date').annotate(max_date=Max('created')) >>> last_stat_per_day [(u'2013-06-22', datetime.datetime(2013, 6, 22, 13, 41, 47, 533102, tzinfo=<UTC>)), (u'2013-06-23', datetime.datetime(2013, 6, 23, 13, 42, 3, 633236, tzinfo=<UTC>))] >>> max_dates = [item[1] for item in last_stat_per_day] >>> max_dates [datetime.datetime(2013, 6, 22, 13, 41, 47, 533102, tzinfo=<UTC>), datetime.datetime(2013, 6, 23, 13, 42, 3, 633236, tzinfo=<UTC>)] >>> stats = Stats.objects.filter(created__in=max_dates) >>> for stat in stats: ... print stat.created, stat.growth ... 2013-06-22 13:41:47.533102+00:00 5 2013-06-23 13:42:03.633236+00:00 1 

I wrote here earlier that this was only one request, but I lied - list_value needs to be converted in order to return max_date for the next request, which means the statement is executed. These are just 2 questions, which would be significantly better than the N + 1 function.

Not portable bit:

 last_stat_per_day = Stats.objects.extra( select={'the_date': 'date(created)' } ).values_list('the_date').annotate(max_date=Max('created')) 

Using extra not ideal, but the source sql here is simple and perfectly suited for database dependent driver replacement. Only date(created) needs to be replaced. You can wrap this method in the user manager if you want, and then you successfully canceled this mess in one place.

Another option is to simply add a DateField to your model, and then you do not need to use additional files at all. You simply replace the values_list call with values_list('created_date') , completely remove extra and name it day. The cost is obvious - more storage space is required. It is also unintuitive why you have a date and DateTime field on the same model. Keeping two in sync can also cause problems.

+4
source share

Perhaps you can do something like:

 import datetime day = datetime.datetime.now().day the_last_one = Stats.objects.filter(created__day=day).order_by('-created')[0] 

or something like

 the_last_one = Stats.objects.filter(created__day=day).order_by('created').latest() 
0
source share

In addition to the other two answers, you might also consider storing the results in a different model (especially if the data per day does not change much after input, and you have large amounts of data). Something like:

 class DailyStat(models.Model): date = models.DateField(unique=True) # Denormalisation yo # Could also store foreign keys to Stats instances if needed max_growth = models.IntegerField() min_growth = models.IntegerField() # . # . # . # and any other stats per day eg average per day 

And add the Celery periodic task:

 from celery.task.schedules import crontab from celery.task import periodic_task import datetime # Periodic task for 1am daily @periodic_task(run_every=crontab(minute=0, hour=1)) def process_stats_ery_day(): # Code to populate DailyStat today = datetime.date.today() # Assumes relevant custom Manager methods exist # Can use regular Django ORM methods to achieve this max = Stats.objects.get_max_growth(date=today) min = Stats.objects.get_min_growth(date=today) ds = DailyStat(date=today, max_growth=max.growth, min_growth=min.growth) ds.save() 

Get results using:

 DailyStat.objects.all() 

Of course, among other factors to consider, this approach presents the problem of having to update the DailyStat when the past stat changes and so on ( signals can be used if you do this way.)

0
source share

TruncDate ist new in Django> 2.0 and now you can make the same query shorter, but only in distinct databases, such as PostgreSQL.

Stats.objects.all().annotate(date=TruncDay('created')).distinct('created').order_by('-date')

0
source share

All Articles