Prefetching and Optimizing: A Django Virtual Models Approach

Lucas Vilela
September 12, 2024

The N+1 problem is a persistent performance challenge plaguing web developers using Object-Relational Mapping (ORM) in frameworks like Django. In this blog post, we're tackling this performance issue that is still a significant concern for web developers using popular web frameworks.

We will also introduce Django Rest Framework (DRF), one of the leading libraries in the Django ecosystem used to build robust APIs, and how you might encounter the N+1 issue when using it in your Django project. Finally, this article will show how we solved this performance issue at Vinta with Django Virtual Models, an open-source tool we developed.

The Persistent Adversary of Database Performance

If you are a web developer using ORMs to access the database with your web framework, you might encounter performance challenges. One such issue is the N+1 issue, which occurs when a query retrieves one set of information but fetches additional related data for each record. This leads to an unnecessary increase in questions and affects performance.

Now that the Olympic season has ended, let’s imagine our application contains information about athletes and related data, such as the games they participate in and the medals they have won.

How ORMs Address the N+1 Problem

For most of the popular ORMs, such as Active Record from Rails, Eloquent from Laravel, and Django ORM from Django, there are a few options to tackle this issue:

  • In Active Record, developers can use .includes() to eager load the associated data;
  • In Eloquent, developers can use ::with(), which functions similarly to .includes() in Active Record. It preloads the related data to avoid the N+1 problem.

For the Django ORM, there are two standard methods to mitigate this issue:

  • select_related() - A JOIN is created on the related models, resulting in a single query. It is most commonly used for OneToOne and ForeignKey relationships;
  • prefetch_related() - Two queries are made: one for the primary object and another for its related objects. This method is best suited for prefetching ManyToMany relationships and reversed ForeignKey relationships. The joining is then done in Python, resulting in an optimized query.

Exploring the Issue with a Django and DRF

In many projects, defining and structuring database models is straightforward in Django, as illustrated by our example:

from django.db import models

class Athlete(models.Model):
    name = models.CharField()
    country = models.CharField()

class Competition(models.Model):
    title = models.CharField()
    athletes = models.ManyToManyField(Athlete)

class Medal(models.Model):
    medal_type = models.CharField(choices=MEDAL_TYPE_CHOICES)
    competition = models.ForeignKey(Competition)
    athlete = models.ForeignKey(Athlete)

Django REST Framework provides a powerful mechanism to expose this data through Serializers APIs. Here’s how you can translate these models into DRF serializers:

from rest_framework import serializers

class CompetitionSerializer(serializers.ModelSerializer):
    class Meta:
        model = Competition
        fields = ["title"]

class MedalSerializer(serializers.ModelSerializer):
    competition = CompetitionSerializer()
    
    class Meta:
        model = Medal
        fields = ["medal_type", "competition"]

class AthleteSerializer(serializers.ModelSerializer):
    competitions = CompetitionSerializer(many=True)
    medals = MedalSerializer(many=True)

    class Meta:
        model = Athlete
        fields = ["name", "country", "competition", "medal"]

Finally, our DRF views to present the data would look like this:

from rest_framework.generics import ListAPIView
from rest_framework.generics import RetrieveAPIView

class AthleteListView(ListAPIView):
    queryset = Athlete.objects.all()
    serializer_class = AthleteSerializer

class AthleteRetrieveView(RetrieveAPIView):
    queryset = Athlete.objects.all()
    serializer_class = AthleteSerializer

This allows us to present the data in JSON format like this:

{
        "name": "Rebeca Andrade",
        "country": "Brazil",
        "competitions": [
            {
                "title": "Women's Floor - Artistic gymnastics"
            }
        ],
        "medals": [
            {
                "medal_type": "Gold",
                "competition": "Women's Floor - Artistic gymnastics"
            }
        ]
    }
]

With just a few lines of code, we've created a basic API that adheres to the Don't Repeat Yourself (DRY) principle using Django and DRF. This principle is a tenet of the Django ecosystem. It means avoiding code duplication to ensure maintainability.

DRY is possible thanks to the reuse of serializers to structure and expose data through the views. However, there are a few underlying costs to this approach.

Unveiling the Hidden Costs: Change Amplification and Other Challenges

When we reuse serializers across different views while adhering to the DRY principle, we must be cautious about the efficiency of the queries they execute. While serializers know which data to fetch, they don’t inherently optimize the data retrieval process.

The example above leads to an N+1 problem when listing Athlete instances and retrieving their related Competitions and Medals.

To mitigate this, we could add prefetch_related to the querysets of AthleteListView and AthleteRetrieveView:

queryset = Athlete.objects.prefetch_related("competition",
"medal").all()

However, this solution introduces a new challenge: it breaks the DRY principle and leads to the Change Amplification issue. Every time someone changes one of the serializers, they must change all views that use it. As a result, developers would need to update multiple parts of the codebase to implement any N+1 optimizations, increasing the risk of errors and adding maintenance overhead.

It would be great if we could use serializers in views, and Django would automatically know which prefetches to perform without the need to declare and maintain them explicitly, avoiding the need to update querysets in multiple places throughout the project.

Fortunately, we have the Django Virtual Models library, which can help us with this.

Introducing Django Virtual Models: A "Menu" of Optimizations

Explore Django Virtual Models and share your feedback with us!
Open Github

Django Virtual Models, an open-source library created by Vinta, significantly enhances the performance and maintainability of Django and Django Rest Framework projects through an advanced prefetching layer.

But how does it achieve this? Let’s explore this further by converting our models into a “Virtual Model” class.

import django_virtual_models as v

from .models import Athlete, Competition, Medal

class VirtualMedal(v.VirtualModel):
    class Meta:
        model = Medal

class VirtualCompetition(v.VirtualModel):
    class Meta:
        model = Competition

class VirtualAthlete(v.VirtualModel):
    medals = VirtualMedal()
    competitions = VirtualCompetition()

    class Metal:
        model = Athlete

We're building a "menu" of optimization options with Django Virtual Models. In this scenario, we explicitly declare possible prefetched related models for Athlete, ensuring efficient data retrieval.

Additionally, Django Virtual Models allow us to refine these optimizations further by filtering prefetches or even adding annotations, all in a declarative and flexible manner.

class VirtualMedal(v.VirtualModel):
    ...
    def get_prefetch_queryset(self, **kwargs):
        competition_title = kwargs.get("competition_title")
        if competition_title:
            return Medal.objects.filter(competition__title=competition_title)
        return Medal.objects.all()
...
class VirtualAthlete(v.VirtualModel):
    ...
    medal_count = v.Annotation(
        lambda qs, **kwargs: qs.annotate(medal_count=Count("medals", distinct=True))
    )

Another significant advantage is the effortless integration of these Virtual Models with our DRF serializers and views.

# serializers.py
class VirtualMedalSerializer(v.VirtualModelSerializer):
    ...
        class Meta:
        ...
        virtual_model = VirtualMedal

class VirtualCompetitionSerializer(v.VirtualModelSerializer):
    ...
        class Meta:
        ...
        virtual_model = VirtualCompetition

class VirtualAthleteSerializer(v.VirtualModelSerializer):
    ...
        class Meta:
        ...
        virtual_model = VirtualAthlete

# views.py
class AthleteListView(v.VirtualModelListAPIView):
    queryset = Athlete.objects.all()
    serializer_class = VirtualAthleteSerializer

This way, the library can automatically do the right prefetches and annotations for you, resulting in performance gain and maintainability.

But what about SerializerMethodField, a powerful Django Rest Framework feature that allows us to customize our serializers? Can Django Virtual Models handle the prefetches inside it?

Let’s take a look at an example:

class AthleteSerializer(v.VirtualModelSerializer):
    has_won_any_medal = serializers.SerializerMethodField
    ...
        class Meta:
        ...
        virtual_model = VirtualAthlete

    def get_has_won_any_medal(self, athlete):
        return len(athlete.medals.all()) > 0

In this example, even with the Virtual Model class used in the serializer, calling athlete.medals.all() still performs an additional query to fetch all the medals just to check if the athlete has won any. This extra query can degrade performance, especially when dealing with large datasets.

To avoid this, we can leverage another feature from the package, type hints, through the hints module. We can use type hints to ensure the necessary data is prefetched, avoiding the extra query.

Let’s improve our example:

# Python < 3.9. For >= 3.9, use `from typing import Annotated`:
from typing_extensions import Annotated
from django_virtual_models import hints
...
    def get_has_won_any_medal(self, Annotated[Athlete, hints.Virtual("medals")]):
        return len(athlete.medals.all()) > 0 # Now it's prefetched

By using the Annotated type hint along with hints.Virtual("medals"), we ensure that the medals are prefetched when the athlete data is fetched, making the has_won_any_medal method more efficient.

Conclusion

Following the Zen of Python principle that "Explicit is better than implicit," Django Virtual Models give developers powerful tools to optimize APIs. By allowing explicit control over prefetching and annotations, particularly in complex serializers, you can improve performance while keeping your codebase clear and maintainable.

Now that you’ve learned Django Virtual Models, how about looking at our Github and telling us your thoughts? Check it out!