Python: Django, what you should NEVER put in transactions?

11 min read6 hours ago

Content

Part 1 Summary
TLDR, Summary
How can transactions harm our apps?
In
Risky
Out
Next — How PSQL commands block each other
Sources

Part 1 Summary

In the prior post, we learned what happens when a function decorated with `transaction.atomic` is called and what happens in `with transaction.atomic()`. In summary:

A connection to the DB is created or retrieved
A transaction is created, e.g., BEGIN; is sent to the DB (if the DB is PSQL or another SQL variant)
From now until the function exists or the with statement concludes — whether with an error or successfully — we will be in the transaction, and DB will be waiting for us.
This means that (at least for ACID DBs like PSQL) the transaction will hold locks on tables and rows it is changing.
When a Django DB operation is executed, the DB grabs the corresponding lock and prevents any conflicting operations on that table or row.
This can cause other connections to timeout due to waiting for the lock to be released.
If the operation fails or if there is a runtime error, the DB rolls back the entire transaction and releases the locks.
If the entire transaction succeeds, then all the changes are committed and available to other DB connections.
The locks are released.

Now, we will discuss what to put in a transaction and what to avoid. Due to the following transaction behaviors, certain operations are dangerous when placed in a transaction block.

Transactions hold on to locks until the transaction fails or is complete.
Transactions reverse all their DB operations when they fail.
Transactions request locks as soon as a DB operation is executed.

TLDR, Summary

In

Reversible operations on the DB — if the DB operation can’t be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail.
Related operations on the DB that are required — we can’t create a new bank account record without a user record, so we need to create both in the same transaction.
Reversible and related business logic — calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination).

Risky

Slow queries — fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users.
Operations on multiple tables — a transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations — another reason to keep migrations small and focused on one or a few tables at a time.
Data Migrations — since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row.
(For PSQL and SQLite only*) Changing tables or columns — these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage.

Out

Irreversible operations — everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can’t be undone.
Blocking calls — since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB.

Keep on reading for alternatives and code examples.

How can transactions harm our apps?

The primary risk is that transactions hold locks until they are done to prevent conflicting operations on tables and rows and allow the transaction to be reversible — this is essential to make the DB operations in the transaction atomic. This means a long running transaction that operates on multiple tables or a few critical ones may cause outages by hogging locks and preventing read/writes to those tables/rows. In essence, if we put the wrong code in a transaction block, we can effectively take down the DB by blocking all other connections to the DB from doing operations on it.

The secondary risk is that transactions need to be reversible and are EXPECTED to be reversible. The DB automatically reverses every operation if an error occurs in the transaction. Therefore, the DB operations we put in the transaction should be reversible — for the most part, we don’t need to worry about this with PSQL. But what about other code?

Oftentimes, when we change our data, we need to do follow-up tasks like firing events, updating services, sending push notifications, etc. These tasks are NOT reversible — we can’t unsend an event, a request, or a notification. If an error occurs, the data changes are rolled back, but we’ve already sent the push notification saying, “Your report has been generated; click here to view it.” What happens when the user or other services act on this false information? There will be a cascade of failures. Therefore, any code that can’t be reversed should not be in a transaction, or we risk leaving our system in a bad state when an error occurs in the transaction.

In

Reversible operations on the DB — if the DB operation can’t be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail.
Related operations on the DB that are required — we can’t create a new bank account record without a user record, so we need to create both in the same transaction.
Reversible and related business logic — calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination).

Risky

These are things that, depending on how much data is being processed and DB traffic, can cause outages due to holding locks for too long. All of these things are fine if they don’t take too long.

Slow queries — fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users.

Examples

Queries on unindexed columns
Queries on large tables
Joins

Alternative

Do these queries before and outside the transaction.

@transaction.atomic
def process_large_order_report(start_date, end_date, min_order_value=1000):
    # Complex query with multiple joins and aggregations
    large_orders = Order.objects.filter(
        created_at__range=(start_date, end_date),
        total_amount__gte=min_order_value,
        status='completed'
    ).select_related(
        'customer',
        'shipping_address',
        'billing_address'
    ).prefetch_related(
        'items__product__category',
        'items__product__supplier'
    ).annotate(
        item_count=Count('items'),
        total_weight=Sum('items__product__weight'),
        discount_percentage=F('discount_amount') * 100 / F('total_amount')
    ).filter(
        # Additional complex filtering
        Q(customer__user__is_active=True) &
        (Q(items__product__category__name='Electronics') |
            Q(items__product__category__name='Furniture')) &
        ~Q(shipping_address__country='US')
    ).order_by('-total_amount')


    # do the transactional work with the large_orders queryset


# Fixed
def process_large_order_report(start_date, end_date, min_order_value=1000):
    # Complex query with multiple joins and aggregations
    large_orders = Order.objects.filter(
        created_at__range=(start_date, end_date),
        total_amount__gte=min_order_value,
        status='completed'
    ).select_related(
        'customer',
        'shipping_address',
        'billing_address'
    ).prefetch_related(
        'items__product__category',
        'items__product__supplier'
    ).annotate(
        item_count=Count('items'),
        total_weight=Sum('items__product__weight'),
        discount_percentage=F('discount_amount') * 100 / F('total_amount')
    ).filter(
        # Additional complex filtering
        Q(customer__user__is_active=True) &
        (Q(items__product__category__name='Electronics') |
            Q(items__product__category__name='Furniture')) &
        ~Q(shipping_address__country='US')
    ).order_by('-total_amount')


    # Start the transaction block
    with transaction.atomic():
        # do the transactional work with the large_orders queryset

Operations on multiple tables — a transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations — another reason to keep migrations small and focused on one or a few tables at a time.

Examples

See Changing table or column structure

Alternative

Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks.

class Migration(migrations.Migration):
    dependencies = [("migrations", "0001_initial")]

    # too many operations
    operations = [
        migrations.RemoveField("Author", "age"),
        migrations.AddField("Author", "rating", models.IntegerField(default=0)),
        migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)),
    ]


# fixed
# 1st migration
class Migration(migrations.Migration):
    dependencies = [("migrations", "0001_initial")]

    operations = [
        migrations.RemoveField("Author", "age"),
    ]


# 2nd migration
class Migration(migrations.Migration):
    dependencies = [("migrations", "0002_initial")]

    operations = [
        migrations.AddField("Author", "rating", models.IntegerField(default=0)),
    ]


# 3rd migration
class Migration(migrations.Migration):
    dependencies = [("migrations", "0003_initial")]

    operations = [
        migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)),
    ]

Data Migrations — since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row.

Alternative

Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time.

def migrate_user_profiles():
    # Get all users with legacy profiles
    users_with_profiles = User.objects.filter(
        legacy_profile__isnull=False
    ).select_related('legacy_profile')


    # Process all users in a single transaction
    with transaction.atomic():
        # Track progress
        total = users_with_profiles.count()
        print(f"Migrating {total} user profiles...")
       
        # Process each user
        for i, user in enumerate(users_with_profiles):
            if i % 100 == 0:
                print(f"Processed {i}/{total} profiles")
               
            legacy = user.legacy_profile
            legacy.update_new_user_profile()


def migrate_user_profiles():
    # Get all users with legacy profiles
    users_with_profiles = User.objects.filter(
        legacy_profile__isnull=False
    ).select_related('legacy_profile')


    # Process all users in a single transaction
    # Track progress
    total = users_with_profiles.count()
    print(f"Migrating {total} user profiles...")
   
    # Process each user
    for i, user in enumerate(users_with_profiles):
        if i % 100 == 0:
            print(f"Processed {i}/{total} profiles")


        with transaction.atomic():
            legacy = user.legacy_profile
            legacy.update_new_user_profile()

(For PSQL and SQLite only*) Changing tables or columns — these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage.

Example

Alter Column
Alter Table

Alternative

Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan.
Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django

*Django only wraps transactions around migrations for PSQL and SQLite.

class Migration(migrations.Migration):
    dependencies = [("migrations", "0001_initial")]

    # this migration, if on a large table, can slow down and block other operations
    # do it later
    operations = [
        migrations.RemoveField("Users", "middle_name"),
    ]

Out

Irreversible operations — everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can’t be undone.

Examples

Adding an event to a queue — trigger other events
Sending an API Call — status updates, trigger jobs, etc.

Alternatives

Do critical business logic that needs to happen in the oncommit callback of the transaction — the oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback.
We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — this is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it.

def transaction(user_data, user_files):
    with transaction.atomic():
        user = User.objects.create(**user_data)
        async_notification_service.send_email(user.email, "You can login now!")
        Account.objects.create(user=user, balance=0)
        # rest of user creation proccess


def transaction(user_data, user_files):
    with transaction.atomic():
        user = User.objects.create(**user_data)
        Account.objects.create(user=user, balance=0)
        # rest of user creation proccess
        # the transaction is still in progress, so it can still be rolled back, it is not
        # committed until the transaction block is exited, so putting the notification here
        # is not a good idea - especially if the job starts immediately tries to read the data
        # this creates a race condition
        async_notification_service.send_email(user.email, "You can login now!")


def transaction(user_data, user_files):
    with transaction.atomic():
        user = User.objects.create(**user_data)
        Account.objects.create(user=user, balance=0)
        # rest of user creation proccess
        transaction.on_commit(partial(async_notification_service.send_email, user.email, "You can login now!"))

Blocking calls — since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB.

Examples

Network calls — sending requests to APIs to get data to put into the transaction.
Logging goes under this — depending on the logger we use, the library may send the logs when we are in the transaction or if we are using datadog, it will have another process save the logs but we still pay the price of storing the logs in memory until the batch job saves them to a file.
Disk operations — loading a CSV to insert into a table or exporting a table to a CSV
CPU heavy tasks — matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially).

Alternatives

Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it.
Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it.
Generate data — if there are uniqueness constraints on the fields that prevent placeholder data from being used.
This is risky — it will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures.
Remove or modify constraints — if the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored.

def transaction(user_data, user_files):
    with transaction.atomic():
        user = User.objects.create(**user_data)
        for file_data in user_files:
            url = Cloudinary.upload_file(file_data['data'])
            Files.objects.create(**file_data['meta_data'], user=user, url=url)
        Account.objects.create(user=user, balance=0)
        # rest of user creation proccess


## not bad
def transaction(user_data, user_files):
    user = None


    with transaction.atomic():
        user = User.objects.create(**user_data)
        Account.objects.create(user=user, balance=0)
        # rest of user creation proccess


    for file_data in user_files:
        url = Cloudinary.upload_file(file_data['data'])
        Files.objects.create(**file_data['meta_data'], user=user, url=url)


## best
from functools import partial
def transaction(user_data, user_files):
    user = None


    with transaction.atomic():
        user = User.objects.create(**user_data)
        Account.objects.create(user=user, balance=0)
        # rest of user creation proccess
        # partials create a callable with the function and arguments
        # so that the function is called with the arguments when the transaction is committed
        # TODO: diff between partial and lambda here???
        transaction.on_commit(partial(create_user_files, user_files, user))



def create_user_files(user_files, user):
    for file_data in user_files:
        url = Cloudinary.upload_file(file_data['data'])
        Files.objects.create(**file_data['meta_data'], user=user, url=url)

Next — How PSQL commands block each other

In the next post, we will dive into PSQL and find out:

How different commands lock tables and rows
Which commands are the riskiest to execute

Sources

Atomic
Django Migrations & Transactions
On databases that support DDL transactions (SQLite and PostgreSQL), all migration operations will run inside a single transaction by default. In contrast, if a database doesn’t support DDL transactions (e.g. MySQL, Oracle) then all operations will run without a transaction.
You can prevent a migration from running in a transaction by setting the atomic attribute to False. For example
It’s also possible to execute parts of the migration inside a transaction using atomic() or by passing atomic=True to RunPython
PSQL Partitions & Django
Django’s ORM doesn’t have built-in support for partitioned tables, so if you want to use partitions in your application, it’s going to take a little extra work.
One way to use partitions is to roll your own migrations that run raw SQL. This will work, but it means you’re going to have to manually manage the migrations for all changes you make to the table in the future.
Another option is to use a package called django-postgres-extra. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for TRUNCATE TABLE and table partitioning
PSQL Transactions
Python: How Django Transactions Work

Python: Django, what you should NEVER put in transactions?

Content

Part 1 Summary

TLDR, Summary

In

Risky

Out

How can transactions harm our apps?

In

Risky

Next — How PSQL commands block each other

Sources

Written by Michael T. Andemeskel

No responses yet