Detecting N+1 queries in Django with unit testing

Detecting N+1 queries in Django with unit testing

When it comes to surfacing up N+1 problems in Django you have a lot of options.

For one, you can use the debug toolbar which is kind enough to show duplicate queries in the SQL inspector:

Debug toolbar Django duplicate query

There is also the nplusone package, which does an amazing job.

If instead you're lazy like me, you can use a unit test to detect N+1 queries, without installing any other package.

Understanding N+1 and defining a baseline

First off, you may want to define a baseline: how many queries should trigger a view in optimal conditions?

Consider the following model for example:

class Activity(models.Model):
machine = models.ForeignKey(to=Machine, on_delete=models.PROTECT)

def __str__(self):
return f"Activity on {self.machine.hostname}"

It has a foreign key to another model, Machine (not shown here). In addition, when we reference any instance of this model from within a template, its __str__ method builds a string based from the hostname property of the related model.

This should immediately raise a red flag: it is a potential N+1 problem.

To put things in context, imagine that in the view I select the machine_set for the current user (machines are also tied to each user) with:

# I'm inside a get_context_data here!
context["machine_list"] = user.machine_set.all()

In the template instead I iterate over each machine to get in turn a list of all activities tied to a single machine:

    <ul>
{% for machine in machine_list %}
{% for activity in machine.activity_set.all %}
<li>{{ activity }}</li>
{% endfor %}
{% endfor %}
</ul>

The result of this arrangement is a huge amount of duplicated queries. The baseline here is that for, let's say, three machine instances I expect only two queries:

  • one query for fetching the root collection of machines
  • one query for fetching all the related activities

Without optimization instead, Django issues four queries:

  • one query for fetching the root collection of machines
  • three duplicated queries for each loop iteration

Having defined the expected baseline, we can create a unit test to surface up the problem.

Detecting N+1 queries in Django with a unit testing

Here's the unit test:

from django.test import TestCase
from django.urls import reverse
from model_bakery import baker


class TestStuff(TestCase):
def test_does_not_make_unnecessary_queries(self):
machines = baker.make("report.Machine", _quantity=300, user=self.user)
activities_group0 = baker.make(
"report.Activity", _quantity=30, machine=machines[0]
)
activities_group1 = baker.make(
"report.Activity", _quantity=30, machine=machines[4]
)

def call_route():
self.client.get(reverse("my-route"))

"""
We expect:
- 1 query for session auth
- 1 query for the user model
- 1 query for the root collection
- 1 query for fetching all the related models
or
- 1 query for the root collection + JOIN
"""

self.assertNumQueries(4, call_route)

With assertNumQueries() we test for the number of queries issued by Django. assertNumQueries() takes:

  • the number of expected queries
  • a reference to the function to call

Here for convenience I created a utility function which calls a route inside the same test method.

If your code is optimized, no matter how many related models you have, no duplicate query must run. To understand the order of magnitude, consider that unoptimized code iterating over 300 models will make 301 queries.

If your view is protected with authentication, the number of queries will vary. In this case you should expect:

  • one query for session auth
  • one query for the user model
  • one query for the root collection
  • one query for fetching all the related models

To fix the previous example and reduce the number of queries we can use prefetch_related() on activity_set:

# I'm inside a get_context_data here!
context["machine_list"] = user.machine_set.all().prefetch_related(
"activity_set"
)

This will reduce the queries to: one query for the root collection and one query for fetching all the related models.

If your logic is different from this example, you can also use select_related(), which does a JOIN, thus reducing the queries to just one: one query for the root collection joined to the related instances.

A real world case: reducing queries in the Django admin

For a Django project with even a thousand records in the database, connected between them by foreign keys, it's relatively easy to overwhelm the Django admin. Consider the following scenario:

Reducing queries in Django admin: worst case

This is a page from the Django admin. The database has only a handful of models, and yet, the page is issuing an unreasonable amount of queries to the database.

To solve this performance problem we can place the page under a unit test, measure the queries, and then optimize with list_select_related in a subclass of admin.ModelAdmin. This can optimize tremendously the amount of queries issued to the database:

Reducing queries in Django admin

You can see an example use of list_select_related in the Django documentation.

Thanks for reading!

Valentino Gagliardi

Hi! I'm Valentino! I'm a freelance consultant with a wealth of experience in the IT industry. I spent the last years as a frontend consultant, providing advice and help, coaching and training on JavaScript, testing, and software development. Let's get in touch!