Trade-Offs of DRY: Balancing Reusability and Performance in Large Django Projects
At Vinta, we understand that even the most successful Django projects can encounter performance hurdles when striving for code quality.
As applications scale and handle increasing user loads, small inefficiencies can cause significant performance degradation and hinder user experience. After a particular scale, focusing more on the Don't Repeat Yourself (DRY) philosophy that permeates Django development can introduce challenges to performance optimizations.
That is the backdrop for our partner Flavio Juvenal's talk at Pygotham 2023. He analyzes how Django projects should shift their focus away from DRY principles once they reach a specific size and prioritize keeping the code that changes together.
The DRY Principle: Balancing Efficiency and Complexity in Software Development
Evolving systems are a constant necessity. One of the guiding principles in this pursuit is the "Don't Repeat Yourself" (DRY) principle. As articulated by its creators, applying this principle means striving for Every piece of knowledge having a single, unambiguous, authoritative representation within a system.
The ultimate goal is reducing the cognitive load required to enhance your code. The process becomes laborious and error-prone if, before every change, we must painstakingly track down every instance of a particular piece of code.
For this reason, DRY stands as a foundational concept in countless libraries. The creators of Django, Django REST Framework, and many others designed them with DRY principles in mind. While the DRY principle can be potent in a developer's toolkit, it is essential to recognize that it comes with trade-offs and potential pitfalls, especially as your project begins to scale.
DRY Challenges: Balancing Efficiency and Complexity
While the DRY (Don't Repeat Yourself) principle is undeniably valuable in software development, it's important to recognize that it can introduce complexities that developers must navigate. This complexity becomes apparent when we consider scenarios like starting projects from scratch or joining projects using frameworks like Django REST Framework for extended periods.
Django REST Framework is indeed a proponent of the DRY principle, which greatly contributes to efficient API development. It allows developers to implement serializers for specific objects and reuse them across multiple views. For instance, a developer can employ the same serializer for both reading and writing operations, promoting code reuse. This aspect of the Django REST Framework is widely appreciated by developers.
However, it's essential to question whether there's a trade-off for this avoidance of repetition. A team should always examine closely the price they pay for adhering to DRY. Here, we delve into the two challenges associated with practicing DRY effectively:
- Change Amplification: While you strive to reduce code redundancy, you may inadvertently increase the interdependencies between modules, precisely what you should avoid for proper "DRYness." The heightened code reuse can lead to a ripple effect, where even a minor change in one part of the codebase necessitates extensive modifications throughout the system. This challenge disrupts project agility and makes maintenance a nightmare.
- The N+1 Problem: Another challenge in DRY is the N+1 problem when dealing with database queries. It occurs when a program issues N additional queries for each related data piece instead of optimizing to perform a fixed number of queries regardless of the dataset size. That can lead to a significant performance bottleneck, impacting application responsiveness. Addressing the N+1 problem requires developers to optimize data retrieval using techniques like eager loading, database joins, and prefetches.
While DRY is a guiding principle that enhances code maintainability and reduces redundancy, developers need to be aware of these potential complexities and trade-offs. Below, we will focus on the initial two challenges: change amplification and the N+1 problem.
Change Amplification: Solving The Ripple Effect in DRY
As you strive to decrease redundancy, you can inadvertently increase the dependencies between modules due to the higher reuse of code abstractions. That leads to an unexpected increase in complexity and cognitive load, which is called "change amplification." In simpler terms, even a seemingly minor change in one part of the codebase can trigger extensive modifications throughout the system due to excessive dependencies. Such a scenario can quickly spiral into a maintenance nightmare, potentially hindering your project's agility.
Anyone aiming to create better software needs to understand these potential issues. The entire team must remain vigilant and proactive in mitigating the risks associated with change amplification. That involves active discussions to determine when reusing code will have a net positive impact.
At Vinta, we strongly advocate the practice of writing RFCs (Request for Comments) and ADRs (Architectural Decision Records) to align the entire team around these tradeoffs and to document the decisions as the "sweet spot" for that specific project. However, it's essential to acknowledge that this sweet spot evolves with the project. The decisions made at a particular point in the project's lifecycle require reconsideration when introducing new epics or use cases to the system. Therefore, we strongly advise against relying solely on memory to recall these decisions.
To assist you further, we've provided several alternatives for tracking and managing team knowledge as your project evolves, ensuring you can effectively navigate the complexities of change amplification while maximizing your project's agility and maintainability.
Exploring The N+1 Problem in DRY
The N+1 problem is a significant performance challenge for growing and scaling Django projects, especially when using Django Rest Framework (DRF). DRF promotes the Don't Repeat Yourself (DRY) principle, encouraging the reuse of serializers across different views. However, this can lead to inefficient database queries if not handled carefully, resulting in the N+1 problem.
From our experience, developers tend to introduce N+1 problems because the construction of querysets in Django occurs in views, while DRF serializers are separate from view code, in their own module like serializers.py inside each Django app. As it's effortless to reuse and even nest serializers into others, developers need to be careful to correspond changes on serializers to changes on views' querysets. There's an implicit coupling between views' querysets and serializers, so there's a change amplification issue here, as well as an unintended trade-off between DRYness and performance. This is especially challenging in DRF, as views can use serializers indirectly due to serializer nesting.
As a project grows in complexity, if a developer adds a nested serializer into another, but forgets to add a prefetch_related in the corresponding view, a new N+1 issue is born, causing unnecessary database queries, and a critical performance issue especially when dealing with large datasets. This is not only a DRF issue, the same problem also occurs on vanilla Django projects, due to the implicit coupling between views' querysets and templates.
There are no built-in warnings or indicators that a serializer or template is causing N+1 queries. Developers can add specialized tools to detect the problem and block N+1 queries, such as nplusone and django-zen-queries. But that doesn't solve the root cause of the problem: the trade-off between DRYness and performance.
Some companies go further than that and build a custom solution to simplify and streamline the process of optimizing querysets while maintaining DRF serializer reusability. At Vinta, we created Django Virtual Models for that purpose. This open-source tool allows developers to declare virtual models that specify prefetches, annotations and joins needed by serializers. By using virtual models instead of regular models in serializers, developers can eliminate the risk of N+1 problems while still benefiting from code reusability. Virtual models also throw special exceptions at development time to indicate to developers there's a prefetch missing after a change in serializer. The problem we described before, of an N+1 introduced after a new serializer was nested into another, cannot happen in a project that uses Django Virtual Models. Vinta has been successfully using Django Virtual Models in production for over a year on several projects, and we eliminated N+1 issues in several DRF endpoints, significantly reducing the p95 latency to around 300ms.
Conclusion
Balancing performance optimizations and the DRY principle is crucial to avoid falling into the trap of sacrificing maintainability and code clarity.