The dos and don'ts for writing Django migrations

Renato Vieira
February 17, 2025

In Django, migrations are a crucial feature that facilitates the process of evolving and managing a database schema over time. As the structure of the application models changes (such as adding new fields, altering existing ones, or creating new models), migrations ensure the database is in sync with those changes.

For that reason, they allow developers to automate the process of applying and rolling back schema changes, providing a version-controlled way to manage the database structure.

In this article, we’ll discuss good practices on how to deal with migrations, explaining what should and shouldn’t be done in order to keep our application running as expected.

Always use apps.get_model to access a model

When we write a data migration, Django will provide our functions with 2 arguments. The first argument is named apps, and from there we can retrieve the models from our application using its get_model method. You might be asking: “Why do we need to use that method? Why can’t we just import the models directly through Python?”.

As mentioned before, migrations can be imagined as snapshots of our system from the time they were created. As our codebase grows, models are updated through new migrations, and fields are added to or removed from our tables. When we use the Python import to get the model class, what we are getting is the most up-to-date version of that model on our code, with all its “current” fields and relationships implemented. However, a migration from the past may not be expecting those fields to be there.

When working with migrations, we should always rely on the historical versions of our models, as they will provide a correct representation of a piece of our application’s state at a point in time, which is the data provided by apps.get_model. It will allow us to obtain an instance of the model “blueprint” with only the fields that were available at the time the migration was originally generated, ensuring that we won’t access fields that didn’t exist then when querying objects in the database.

We must, however, be careful about what we can do with those instances. As they’re intended to be snapshots of a database table schema at a point in time, they also don’t come packed with custom methods that have been implemented in the model class.

For cases where we’re trying to use methods that have been fully created by us (i.e.: that don’t exist in the base Model class from which our application models inherit), the migration itself will raise a NotImplementedError exception, which will alert us that something in our code isn’t allowed.

However, the biggest issue arises when we rely on overridden methods, such as the save method. Our instance has access to it, since it’s a default method used to perform write operations into the database, however, its implementation won’t have access to any customizations or checks that we may have added to our code. It’ll only perform that operation and check for constraints that are defined at the database level, and it won’t alert us that any extra checks at the code level haven’t been performed.

If we need to add custom checks, we have the option to copy our code into our migration flow and manually perform the validations there. Remember that the rule still is to only access any field from the database schema that exists at that point in time, so we must be extra careful when doing those checks here.

Never alter production migrations - even if you're confident in the changes

Migration files are supposed to be executed in a deterministic way. This means that if a migration pipeline runs today to build database A, it should yield the same results when we run it a week (or a month, or even a year) later to build database B.

Once migrations have been applied to a production database, each file represents a snapshot of the schema at that point in time, and modifying these migrations can lead to significant issues. As an extreme example, suppose that you executed a migration that created a table with some fields, deployed it to a production environment, and then remembered that you had to add an extra field to it.

If you opt to modify the migration file and add the new field there, without performing any rollback operation on that environment to undo the migration, it won’t run again there and that environment won’t have the new field in the table. This issue can lead to a myriad of more complex problems that escalate quickly and become exponentially harder to solve.

The rule of thumb for the example above is to always create new migrations for any schema changes that need to be made after deployment, ensuring that all environments remain consistent and predictable.

Always provide a reverse_func parameter to your RunPython call

Django migration engine is smart enough to know to revert a set of migrations that it automatically creates. Those migrations are used to add, remove, or modify fields to existing tables (or even create new tables or drop them), so it’s pretty much straightforward for the engine to know what should be done when it should go backward: if a field was added, the reverse operation should remove it, etc.

However, when we deal with custom data migrations that run Python code inside them, Django won’t know what it should do when it should discard those changes, since we can do pretty much everything on the function we write. For that purpose, Django’s default behavior is to block reverting those migrations and to raise an exception when it reaches this scenario unless the developer explicitly specifies what it should be done.

The RunPython class’ constructor accepts an optional second parameter, which is reverse_func. It behaves similarly to the first parameter (the data migration function itself), as it also expects a Python function that accepts two parameters: apps and schema_editor. Inside this function, the developer may write code that undoes what the "forward" data migration performed. For example, if the data migration changed column values from "a" to "b", the reverse function would change them back from "b" to "a".

The good news here is that you don’t need to write your own custom reverse function if your business logic doesn’t require it (i.e.: it’s fine to keep the values changed). Django’s RunPython class provides a pre-implemented helper that actually “does nothing”, the RunPython.noop static method, which can be passed as the reverse function parameter. When Django’s migration engine reaches that migration going backward, it’ll execute the function (which really just returns None) and then move on.

Conclusion

Django migrations are a powerful tool for managing database schema changes, but they require careful handling to maintain system reliability. These practices might seem overly cautious at first, but they help prevent subtle bugs and maintain consistency across different environments.

By following these guidelines, you'll ensure your database migrations remain predictable and maintainable as your application grows, saving you from potentially complex debugging sessions and data inconsistencies down the line.