Handling null values on multi-column aggregations with Django
PostgreSQL excels at operations between rows, but when working with operations across both rows and columns, behaviors become less intuitive. One particularly challenging area is handling null values in multi-column aggregations, especially when using Django's ORM abstraction.
This post explores a common problem: how to make Django generate the correct SQL to sum across multiple columns while maintaining the distinction between zero and null values. Specifically, we want to solve this scenario:
If all columns of interest are null, the result should be null (not zero); otherwise, it should be the sum of non-null values.
This distinction is critical in data analytics and reporting contexts. For example, when aggregating metrics data, a null result might indicate a data collection issue that requires investigation, while a zero represents an actual measured value of zero.
Problem Definition
Consider a table with the following structure:

where:
- Profile & Date are mandatory and unique together
- Metrics are source metrics which can be Null
The application calculates target metrics by summing source metrics. For example:
- Impressions Total: metric_1 (impressions on mobile) + metric_2 (impressions on desktop)

Our challenge: on a table with multiple columns of interest, if all the columns are null, return null; otherwise, return the sum of the existing values.
Approaches and Solutions
Option 1: Using Sum with a default=0
The simplest approach might seem to be using Django's Sum
aggregation with a default value:
MetricTable.objects.filter(filter_q).annotate(
metric1_total=Sum("impressions_mobile") + Sum("impressions_desktop") + Sum("generic_impressions") + ...
)
Which generates the following SQL query:
SELECT
"profile"."id" AS "id",
"profile"."platform" AS "profile",
"genericmetrics"."date" AS "date",
(
COALESCE(SUM("genericmetrics"."impressions_mobile"), 0) +
COALESCE(SUM("genericmetrics"."impressions_desktop"), 0) +
COALESCE(SUM("genericmetrics"."generic_impressions"), 0)
) AS "impressions_total"
FROM
"genericmetrics"
INNER JOIN "profile" ON (
"genericmetrics"."profile_id" = "genericmetrics"."id"
)
GROUP BY
"profile"."id",
"profile"."platform",
"genericmetrics"."date";
Result: This solves the multi-column operation, but it defaults to 0 when all values are Null - not what we want.
Option 2: Using Case/When
A more conditional approach uses Django's Case/When
expressions:
MetricTable.objects.filter(filter_q).annotate(
metric1_total=Case(
When(
Q(
impressions_mobile__isnull=True,
impressions_desktop__isnull=True,
generic_impressions__isnull=True
),
then=Value(None)
),
# If not all null, sum the fields with default=0 to handle individual nulls
default=Sum("impressions_mobile", default=0) +
Sum("impressions_desktop", default=0) +
Sum("generic_impressions", default=0)
)
)
The query before, generates the following SQL query:
SELECT
"profile"."id" AS "id",
"profile"."platform" AS "profile",
"genericmetrics"."date" AS "date",
CASE
WHEN (
"genericmetrics"."impressions_mobile" IS NULL
AND "genericmetrics"."impressions_desktop" IS NULL
AND "genericmetrics"."generic_impressions" IS NULL
) THEN NULL
ELSE (
COALESCE(SUM("genericmetrics"."impressions_desktop"), 0) +
COALESCE(SUM("genericmetrics"."generic_impressions"), 0) +
COALESCE(SUM("genericmetrics"."impressions_mobile"), 0)
)
END AS "impressions_total"
FROM
"genericmetrics"
INNER JOIN "common_profile" ON (
"genericmetrics"."profile_id" = "common_profile"."id"
)
WHERE
"genericmetrics"."date" >= '2021-01-01'::date
AND "genericmetrics"."date" < '2022-12-31'::date
GROUP BY
"common_profile"."match_id",
"common_profile"."platform",
"genericmetrics"."date",
-- NEW GROUP BY CLAUSES
"genericmetrics"."impressions_mobile",
"genericmetrics"."impressions_desktop",
"genericmetrics"."generic_impressions",
Result: While this approach seems logical, it doesn't work correctly due to the default Django implementation. When using the Django ORM's Case/When expressions in combination with multi-column operations, it's important to ensure that the expressions within the Case statement have consistent types. If the columns have different types unexpected behaviors appear depending on the condition, such as updating GROUP BY conditions and returning multiple lines instead of a single result.
For this scenario, Django added the metrics columns in the condition in the group by, resulting in multiple columns grouped by the same fields but with different values. Check the sql for more details.
*NOTE: The behavior is the same for any other types of expressions, being OR, using ExpressionWrappers, etc.
Option 3: Using SUM with Aggregate Functions
You might try nesting aggregation functions:
MetricTable.objects.filter(filter_q).annotate(
impressions_total=Sum(
Case(
When(
Q(
impressions_mobile__isnull=True,
impressions_desktop__isnull=True,
generic_impressions__isnull=True
),
then=Value(None)
),
default=Sum('impressions_mobile', default=0) +
Sum('impressions_desktop', default=0) +
Sum('generic_impressions', default=0)
),
)
)
This theoretically should work, but it doesn’t because Django cannot compute Sums of Case/When due to a database restriction: aggregate function calls cannot be nested
. The solution to this error would be to do a subquery, but this would have a performance cost and would change the code architecture.
Option 4: Using NullIf
MetricTable.objects.filter(filter_q).annotate(
impressions_total=NullIf(
[Sum('impressions_mobile', default=0) +
Sum('impressions_desktop', default=0) +
Sum('generic_impressions', default=0)],
Case(
When(
Q(
impressions_mobile__isnull=True,
impressions_desktop__isnull=True,
generic_impressions__isnull=True
),
then=Value(0)
),
default=Value(-1),
output_field=IntegerField()
)
)
)
NullIf is a nice and confusing operator that can also solve the problem, it accepts two expressions and returns None
if they are equal, otherwise returns expression1
.
Database Functions | Django documentation
However, it has the same problem of using Case/When: the columns of the condition are added to the GROUP BY.
Final Solution: Custom Expression Class
By manually experimenting with the SQL, I discovered that the issue wasn't with PostgreSQL but with how Django generates the Case/When
clauses. The solution is to use aggregate functions on individual columns within the condition check.
Removing the columns from the GROUP BY and Using the columns in a aggregate function
The key insight is that SUM()
on a column returns NULL
only if all rows are NULL
for that column.
1. MetricTable.objects.aggregate(total=Sum("impressions_desktop"))
> {'total': 100}
2. When all values for a column are None:
MetricTable.objects
.filter(impressions_desktop__isnull=True)
.aggregate(total=Sum("impressions_desktop"))
> {'total': None}
We can use this property to create a clean custom expression:
class IsSumNull(Expression):
"""
Customized expression for use with Case/When on multi-column operations
with NULL values. This avoids the Django issue where Case/When across
multiple columns adds conditions to the GROUP BY clause.
This expression checks if the Sum of all rows for field_name is null,
without changing the final GROUP BY of the associated query.
"""
output_field = BooleanField()
def __init__(self, field_name, **kwargs):
self.field_name = field_name
super().__init__(**kwargs)
def as_sql(self, compiler, connection):
sql = f"SUM({compiler.quote_name_unless_alias(self.field_name)}) IS NULL"
return sql, []
def get_group_by_cols(self):
return []
Now we can use this expression in our queries:
MetricTable.objects
.filter(filter_q)
.annotate(
impressions_total=Case(
When(
Q(
IsSumNull('impressions_mobile'),
IsSumNull('impressions_desktop'),
IsSumNull('generic_impressions')
),
then=Value(None)
),
default=Sum('impressions_mobile', default=0) +
Sum('impressions_desktop', default=0) +
Sum('generic_impressions', default=0)
)
)
Conclusion
When working with multi-column aggregations in PostgreSQL through Django's ORM, special care is needed when handling null values. The custom IsSumNull
expression class provides a clean solution that properly distinguishes between zero and null values and avoids GROUP BY issues.