Handling null values on multi-column aggregations with Django

Rebeca Sarai

March 17, 2025

PostgreSQL excels at operations between rows, but when working with operations across both rows and columns, behaviors become less intuitive. One particularly challenging area is handling null values in multi-column aggregations, especially when using Django's ORM abstraction.

This post explores a common problem: how to make Django generate the correct SQL to sum across multiple columns while maintaining the distinction between zero and null values. Specifically, we want to solve this scenario:

If all columns of interest are null, the result should be null (not zero); otherwise, it should be the sum of non-null values.

This distinction is critical in data analytics and reporting contexts. For example, when aggregating metrics data, a null result might indicate a data collection issue that requires investigation, while a zero represents an actual measured value of zero.

Problem Definition

Consider a table with the following structure:

where:

Profile & Date are mandatory and unique together
Metrics are source metrics which can be Null

The application calculates target metrics by summing source metrics. For example:

Impressions Total: metric_1 (impressions on mobile) + metric_2 (impressions on desktop)

Our challenge: on a table with multiple columns of interest, if all the columns are null, return null; otherwise, return the sum of the existing values.

Approaches and Solutions

Option 1: Using Sum with a default=0

The simplest approach might seem to be using Django's Sum aggregation with a default value:

MetricTable.objects.filter(filter_q).annotate(
    metric1_total=Sum("impressions_mobile") + Sum("impressions_desktop") + Sum("generic_impressions") + ...
)

Which generates the following SQL query:

SELECT 
    "profile"."id" AS "id",
    "profile"."platform" AS "profile",
    "genericmetrics"."date" AS "date",
    (
        COALESCE(SUM("genericmetrics"."impressions_mobile"), 0) + 
        COALESCE(SUM("genericmetrics"."impressions_desktop"), 0) + 
        COALESCE(SUM("genericmetrics"."generic_impressions"), 0)
    ) AS "impressions_total"
FROM 
    "genericmetrics"
    INNER JOIN "profile" ON (
        "genericmetrics"."profile_id" = "genericmetrics"."id"
    )
GROUP BY 
    "profile"."id",
    "profile"."platform",
    "genericmetrics"."date";

Result: This solves the multi-column operation, but it defaults to 0 when all values are Null - not what we want.

Option 2: Using Case/When

A more conditional approach uses Django's Case/When expressions:‍

MetricTable.objects.filter(filter_q).annotate(
    metric1_total=Case(
        When(
            Q(
                impressions_mobile__isnull=True,
                impressions_desktop__isnull=True,
                generic_impressions__isnull=True
            ),
            then=Value(None)
        ),
        # If not all null, sum the fields with default=0 to handle individual nulls
        default=Sum("impressions_mobile", default=0) + 
                Sum("impressions_desktop", default=0) + 
                Sum("generic_impressions", default=0)
    )
)

The query before, generates the following SQL query:

SELECT 
    "profile"."id" AS "id",
    "profile"."platform" AS "profile",
    "genericmetrics"."date" AS "date",
    CASE
        WHEN (
            "genericmetrics"."impressions_mobile" IS NULL
            AND "genericmetrics"."impressions_desktop" IS NULL
            AND "genericmetrics"."generic_impressions" IS NULL
        ) THEN NULL
        ELSE (
            COALESCE(SUM("genericmetrics"."impressions_desktop"), 0) +
            COALESCE(SUM("genericmetrics"."generic_impressions"), 0) +
            COALESCE(SUM("genericmetrics"."impressions_mobile"), 0)
        )
    END AS "impressions_total"
FROM 
    "genericmetrics"
    INNER JOIN "common_profile" ON (
        "genericmetrics"."profile_id" = "common_profile"."id"
    )
WHERE 
    "genericmetrics"."date" >= '2021-01-01'::date
    AND "genericmetrics"."date" < '2022-12-31'::date
GROUP BY 
    "common_profile"."match_id",
    "common_profile"."platform",
    "genericmetrics"."date",
    -- NEW GROUP BY CLAUSES
    "genericmetrics"."impressions_mobile",
    "genericmetrics"."impressions_desktop",
    "genericmetrics"."generic_impressions",

Result: While this approach seems logical, it doesn't work correctly due to the default Django implementation. When using the Django ORM's Case/When expressions in combination with multi-column operations, it's important to ensure that the expressions within the Case statement have consistent types. If the columns have different types unexpected behaviors appear depending on the condition, such as updating GROUP BY conditions and returning multiple lines instead of a single result.

For this scenario, Django added the metrics columns in the condition in the group by, resulting in multiple columns grouped by the same fields but with different values. Check the sql for more details.

*NOTE: The behavior is the same for any other types of expressions, being OR, using ExpressionWrappers, etc.

Option 3: Using SUM with Aggregate Functions

You might try nesting aggregation functions:

MetricTable.objects.filter(filter_q).annotate(
    impressions_total=Sum(
        Case(
            When(
                Q(
                    impressions_mobile__isnull=True,
                    impressions_desktop__isnull=True,
                    generic_impressions__isnull=True
                ),
                then=Value(None)
            ),
            default=Sum('impressions_mobile', default=0) + 
                    Sum('impressions_desktop', default=0) + 
                    Sum('generic_impressions', default=0)
        ),
  )
)

This theoretically should work, but it doesn’t because Django cannot compute Sums of Case/When due to a database restriction: aggregate function calls cannot be nested. The solution to this error would be to do a subquery, but this would have a performance cost and would change the code architecture.

Option 4: Using NullIf

MetricTable.objects.filter(filter_q).annotate(
    impressions_total=NullIf(
        [Sum('impressions_mobile', default=0) + 
        Sum('impressions_desktop', default=0) + 
        Sum('generic_impressions', default=0)],
        Case(
            When(
                Q(
                    impressions_mobile__isnull=True,
                    impressions_desktop__isnull=True,
                    generic_impressions__isnull=True
                ),
                then=Value(0)
            ),
            default=Value(-1),
            output_field=IntegerField()
        )
    )
)

NullIf is a nice and confusing operator that can also solve the problem, it accepts two expressions and returns None if they are equal, otherwise returns expression1.

Database Functions | Django documentation

However, it has the same problem of using Case/When: the columns of the condition are added to the GROUP BY.

Final Solution: Custom Expression Class

By manually experimenting with the SQL, I discovered that the issue wasn't with PostgreSQL but with how Django generates the Case/When clauses. The solution is to use aggregate functions on individual columns within the condition check.

Removing the columns from the GROUP BY and Using the columns in a aggregate function

The key insight is that SUM() on a column returns NULL only if all rows are NULL for that column.

1. MetricTable.objects.aggregate(total=Sum("impressions_desktop"))
  > {'total': 100}
2. When all values for a column are None:
    MetricTable.objects
        .filter(impressions_desktop__isnull=True)
        .aggregate(total=Sum("impressions_desktop"))
    > {'total': None}

We can use this property to create a clean custom expression:

class IsSumNull(Expression):
    """
    Customized expression for use with Case/When on multi-column operations 
    with NULL values. This avoids the Django issue where Case/When across 
    multiple columns adds conditions to the GROUP BY clause.
    
    This expression checks if the Sum of all rows for field_name is null,
    without changing the final GROUP BY of the associated query.
    """
    
    output_field = BooleanField()
    
    def __init__(self, field_name, **kwargs):
        self.field_name = field_name
        super().__init__(**kwargs)
        
    def as_sql(self, compiler, connection):
        sql = f"SUM({compiler.quote_name_unless_alias(self.field_name)}) IS NULL"
        return sql, []
        
    def get_group_by_cols(self):
        return []

Now we can use this expression in our queries:

MetricTable.objects
    .filter(filter_q)
    .annotate(
        impressions_total=Case(
            When(
                Q(
                    IsSumNull('impressions_mobile'),
                    IsSumNull('impressions_desktop'),
                    IsSumNull('generic_impressions')
                ),
                then=Value(None)
            ),
            default=Sum('impressions_mobile', default=0) + 
                    Sum('impressions_desktop', default=0) + 
                    Sum('generic_impressions', default=0)
        )
    )

Conclusion

When working with multi-column aggregations in PostgreSQL through Django's ORM, special care is needed when handling null values. The custom IsSumNull expression class provides a clean solution that properly distinguishes between zero and null values and avoids GROUP BY issues.

Handling null values on multi-column aggregations with Django

Problem Definition

Approaches and Solutions

Option 1: Using Sum with a default=0

Option 2: Using Case/When

Option 3: Using SUM with Aggregate Functions

Option 4: Using NullIf

Final Solution: Custom Expression Class

Conclusion

Join the Tech Forward newsletter

Related articles

Advanced Django querying: sorting events by date

Vinta at DjangoCon US 2022: highlights and takeaways

Functional programming in Python: the simplest way