Detecting N+1 queries in Django with unit testing
When it comes to surfacing up N+1 problems in Django you have a lot of options.
For one, you can use the debug toolbar which is kind enough to show duplicate queries in the SQL inspector:
There is also the nplusone package, which does an amazing job.
If instead you're lazy like me, you can use a unit test to detect N+1 queries, without installing any other package.
Understanding N+1 and defining a baseline
First off, you may want to define a baseline: how many queries should trigger a view in optimal conditions?
Consider the following model for example:
class Activity(models.Model):
machine = models.ForeignKey(to=Machine, on_delete=models.PROTECT)
def __str__(self):
return f"Activity on {self.machine.hostname}"
It has a foreign key to another model, Machine
(not shown here). In addition, when we reference any instance of this model from within a template, its __str__
method builds a string based from the hostname
property of the related model.
This should immediately raise a red flag: it is a potential N+1 problem.
To put things in context, imagine that in the view I select the machine_set
for the current user (machines are also tied to each user) with:
# I'm inside a get_context_data here!
context["machine_list"] = user.machine_set.all()
In the template instead I iterate over each machine to get in turn a list of all activities tied to a single machine:
<ul>
{% for machine in machine_list %}
{% for activity in machine.activity_set.all %}
<li>{{ activity }}</li>
{% endfor %}
{% endfor %}
</ul>
The result of this arrangement is a huge amount of duplicated queries. The baseline here is that for, let's say, three machine instances I expect only two queries:
- one query for fetching the root collection of machines
- one query for fetching all the related activities
Without optimization instead, Django issues four queries:
- one query for fetching the root collection of machines
- three duplicated queries for each loop iteration
Having defined the expected baseline, we can create a unit test to surface up the problem.
Detecting N+1 queries in Django with a unit testing
Here's the unit test:
from django.test import TestCase
from django.urls import reverse
from model_bakery import baker
class TestStuff(TestCase):
def test_does_not_make_unnecessary_queries(self):
machines = baker.make("report.Machine", _quantity=300, user=self.user)
activities_group0 = baker.make(
"report.Activity", _quantity=30, machine=machines[0]
)
activities_group1 = baker.make(
"report.Activity", _quantity=30, machine=machines[4]
)
def call_route():
self.client.get(reverse("my-route"))
"""
We expect:
- 1 query for session auth
- 1 query for the user model
- 1 query for the root collection
- 1 query for fetching all the related models
or
- 1 query for the root collection + JOIN
"""
self.assertNumQueries(4, call_route)
With assertNumQueries()
we test for the number of queries issued by Django. assertNumQueries()
takes:
- the number of expected queries
- a reference to the function to call
Here for convenience I created a utility function which calls a route inside the same test method.
If your code is optimized, no matter how many related models you have, no duplicate query must run. To understand the order of magnitude, consider that unoptimized code iterating over 300 models will make 301 queries.
If your view is protected with authentication, the number of queries will vary. In this case you should expect:
- one query for session auth
- one query for the user model
- one query for the root collection
- one query for fetching all the related models
To fix the previous example and reduce the number of queries we can use prefetch_related()
on activity_set
:
# I'm inside a get_context_data here!
context["machine_list"] = user.machine_set.all().prefetch_related(
"activity_set"
)
This will reduce the queries to: one query for the root collection and one query for fetching all the related models.
If your logic is different from this example, you can also use select_related()
, which does a JOIN, thus reducing the queries to just one: one query for the root collection joined to the related instances.
A real world case: reducing queries in the Django admin
For a Django project with even a thousand records in the database, connected between them by foreign keys, it's relatively easy to overwhelm the Django admin. Consider the following scenario:
This is a page from the Django admin. The database has only a handful of models, and yet, the page is issuing an unreasonable amount of queries to the database.
To solve this performance problem we can place the page under a unit test, measure the queries, and then optimize with list_select_related
in a subclass of admin.ModelAdmin
. This can optimize tremendously the amount of queries issued to the database:
You can see an example use of list_select_related
in the Django documentation.
Thanks for reading!