Python: How Django Transactions Work

8 min read1 day ago

Logic Diagram of Django Transactions — Entry Points

Content

TLDR — Summary of Django Transaction Logic
Logic Diagram
Transactions & ATOMIC Operations
Transactions in Django
Django autocommit mode — Turning off Automatic Transactions
Sources

TLDR — Summary of Django Transaction Logic

This is for apps running on PSQL or other similar SQL based databases (DBs); the underlying rules around transactions may not apply to other DBs. Under the hood, Django’s transaction management code does the following (this is copied from Django’s doc -scroll down- with my own bullet points for clarification):

opens a transaction when entering the outermost atomic block;

Gets or creates a connection to the database (DB) and sets it as a transaction.

requests and acquires locks when a DB operation is encountered;

All DB operations require locks — some locks are loose and will allow other connections to execute operations on the same resource, but other locks are more stringent; they are exclusive and will block both read and write operations. Creating, updating, and deleting records, as well as altering tables and columns, will acquire more stringent locks that will block other connections from executing operations. These locks are held until the END of the transaction.

creates a savepoint when entering an inner atomic block;

This is automatically done when, for example, inserting records into a table — some model operations will automatically wrap themselves in atomic blocks.
This can be manually done by calling a function that is wrapped by atomic() or using atomic() in a with statement (as a context manager).
Savepoints will use the same connection as the outer transaction — if you look at the Django code for savepoints it is just a recursive call to create another transaction.
Savepoints can be nested within savepoints — a stack keeps track of savepoints.

releases or rolls back to the savepoint when exiting an inner block;

After the insertion, the savepoint is removed either by committing the changes and making them available to the rest of the transaction (but NOT other connections) or by rolling back the changes.

when code that doesn’t touch the DB is encountered the code is executed as usual;

The locks on the DB are still held while non-DB logic is executed, e.g., if we send an HTTP request in the body of a transaction, the locks will be held until that request is sent and the response is returned. Django does not release the locks when we stop executing DB operations. It holds the locks UNTIL the outer transaction is finished.
Locks acquired in nested transactions are also held until the outer transaction is finished or the nested transaction is rolled back.

when an RuntimeError occurs or an error is raised outside of the DB (not-related to the DB operations);

The transaction is stopped, rolled back, and the locks are released.
If the error occurs in a nested transaction, the outer transaction is not affected. The rollback occurs but the outer transaction can continue uninterrupted (as long as the raised error is caught and handled).
Transactions don’t catch errors, but they depend on them to know when to roll back changes. Therefore, don’t wrap DB operations in a try/except, instead wrap them in a transaction and then wrap the transaction in a try/except. Doing so lets the DB operation fail and be rolled back automatically without impacting the outer transaction.

commits or rolls back the transaction when exiting the outermost block.

Finally, the entire transaction is committed, making the changes available to all DB users/connections. Or it is rolled back, leaving the DB in the same state as it was when the transaction was encountered.

In the next blog, I’ll write about what you should or shouldn’t put in a transaction to avoid outages caused by lock starvation.

Logic Diagram

Transactions & ATOMIC Operations

Transactions are mechanisms that allow ACID DB’s (PSQL, SQL, latest MongoDB, etc.) to be ATOMIC (the A in ACID). This means all writes to the DB are done as a single operation — no matter how complex. If the write succeeds, all the changes to the DB persist and are available to all connections simultaneously (no partial writes). If the write fails, all the changes are rolled back — again, there are no partial changes. Transactions guarantee these rules:

The data in the DB will not change in the middle of a transaction
The DB will not be left in a bad state if the transaction fails
The DB will either have all of the new data or none of

These rules allow the user of the DB to bundle complex operations together and execute them as one operation. For example, executing a transfer from one bank account to another; if we did not have transactions, then in the middle of these two write operations, there could be a withdrawal or even a close account operation that makes the transfer invalid. Transactions allow the user some form of traffic control. We can block all other conflicting operations while the transaction is in progress.

The operations are blocked through a series of locks on tables and rows. In PSQL and other SQL variants, transactions are created with BEGIN; then the locks are acquired when an operation like select/insert/delete/alter is run and the transactions ends with COMMIT or ROLLBACK. The locks are released when COMMIT or ROLLBACK is executed. Fortunately, Django allows us to create transactions without having to use these three statements (but we still need to worry about the locks; more on that in the next post).

- Start a new transaction
BEGIN;
SELECT …
INSERT INTO …
UPDATE …
DELETE FROM …
 - If everything is successful, commit the transaction
COMMIT;

The select acquires locks that block updates on the rows being read.
The insert acquires locks that block updates on the rows being created.
The update acquires locks that block updates on the rows being updated.
The delete acquires locks that block updates on the rows being deleted.
We don’t need to explicitly check for errors and call ROLLBACK — the transaction does this for us.
If there was an alter table operation in there it would block all reads and writes on the table (more on this in the next blog).

Transactions in Django

from django.db import transaction
from app.core.models import Accounts, Fee, NotificationJob

def do_migration():
  overdrawn_accounts = Accounts.objects.filter(type='overdrawn')
  for acount in overdrawn_accounts:
    create_notification(acount)


@transaction.atomic
def create_notification(acount: Accounts):
  # $5 fee for overdraft - 500 because we never store money as float!!!
  recall = Fee.objects.create(acount=acount, description='Fee for overdraft', amount=500)
  NotificationJob.objects.create(recall=recall, notification_type='all')
  acount.status = 'awaiting_payment'
  acount.save()


def do_migration2():
  overdawn_account = Accounts.objects.filter(type='overdrawn')
  for account in overdawn_account:
    with transaction.atomic():
      recall = Fee.objects.create(acount=account, description='Fee for overdraft', amount=500)
      NotificationJob.objects.create(recall=recall, notification_type='all')
      account.status = 'awaiting_payment'
      account.save()

Django automatically wraps every DB operation in a transaction for us when run in autocommit mode (more on that below). Explicit transactions can be created using the atomic() decorator or context manager (with atomic()) — when atomic() is used, individual DB operations are NOT wrapped in a transaction or committed immediately to the DB (they are executed, but the changes to the DB are not visible to other users). Instead, the entire function is wrapped in a transaction block and committed at the end (if no errors are raised) — COMMIT is executed. If there are errors, the DB changes are rolled back — ROLLBACK is executed. This is why transactions hold onto locks until the very end; we don’t want to release a lock on a table, have another connection change the table or read it, and then be forced to roll back what was just changed or read.

ANGRY_CUSTOMER_THRESHOLD = 3
@transaction.atomic
def create_notification(acount: Accounts):
  recall = Fee.objects.create(acount=acount, description='Fee for overdraft', amount=500)
  NotificationJob.objects.create(recall=recall, notification_type='all')

  try:
    # when this completes successfully, the changes will be available to the outer transaction
    with transaction.atomic():
      owner = acount.owner
      fees = Fee.objects.filter(owner=owner).count()
      if fees >= ANGRY_CUSTOMER_THRESHOLD:
        for fee in fees:
          fee.amount = 0
          fee.save()
        owner.status = 'angry'
        owner.save()

  # as long as we catch the error, the outer transaction will not be rolled back
  except Exception as e:
    logger.error(f'Error while removings fees for account {acount.id}: {e}')

  acount.status = 'awaiting_payment'
  acount.save()

We can also nest transactions by calling another function that is wrapped by atomic() or using the context manager within a transaction. This allows us to create savepoints where we can attempt risky operations without affecting the rest of the transaction — when an inner transaction is detected, a save point is created, the inner transaction is executed, and if the inner transaction fails the data is rolled back to the save point and the outer transaction continues. If the outer transaction fails, all the inner transactions are rolled back alongside the outer transaction. Transaction nesting can be prevented by setting durable=True in atomic() — this will cause the transaction to raise a RuntimeError if any inner transactions are detected.

Nested transactions DO NOT release locks unless nested transactions are rolled back or the outer transaction is committed or rolled back.
Nested transactions DO NOT catch DB errors; they simply roll back the operations in the transaction that caused the error. We still need to catch the DB error for the outer transaction to continue.
Changes made in nested transactions are available to the outer transaction and subsequent nested transactions.

Django autocommit mode — Turning off Automatic Transactions

DATABASES = {
  'default': {
    # True by default 
    'AUTOCOMMIT': False,
    # ...rest of configs
  }
}

By default, Django runs in autocommit mode, meaning every DB operation is wrapped in a transaction and immediately run (committed, hence autocommit) except if the operation is explicitly wrapped in a transaction by the user. This obfuscates the transaction logic of the underlying DB — some DBs, like PSQL, automatically wrap all operations in transactions while others don’t. If we have Django doing this for us, we don’t need to worry about the underlying DB logic. We can turn off autocommit mode, but that would leave us relying on the DB to manage operations and ensure they are ATOMIC, which is risky and inconvenient for developers. It would not be fun if, while creating an API, we had to figure out which write operations are in transactions and, therefore, will be written in total and not partially.

However, we may want to turn off autocommit mode if we know the number of connections on the DB and the sequence of operations they are executing. If we figure out how to deconflict these operations, then we can turn off autocommit mode and gain some efficiencies:

We remove two commands from every operation: BEGIN and COMMIT/ROLLBACK. BUT most DB’s automatically wrap every operation in a transaction so this may be pointless.
Locks won’t be held for too long — as soon as an operation ends, the locks it holds are released, compared to waiting for the entire transaction to end.
Deadlocks are less likely — deadlocks occur when two transactions need a lock that the other transaction holds. This leaves both transactions waiting for the other to finish, but neither can finish. Deadlocks can still occur however. Some DB operations will do multiple writes e.g. updating a column that has an index on it will update the column and the index (two writes at least). This can lead to deadlocks if another connection attempts to operate on the affected rows or columns.

The downsides are the higher risks of data corruption. It is not worth it but maybe there is a use case that I can’t think of.