Implementing Async Batch Invoice Processing with Celery

This page resolves a single, specific failure: a freight audit pipeline that parses carrier invoices synchronously and collapses — out-of-memory worker kills, lost invoices, and fractured audit lineage — the moment a high-volume submission window hits.

If your ingestion process deserializes thousands of line items in one thread, a 50 MB PDF bundle or a malformed EDI 210 is enough to spike memory, deadlock the parser, and break the audit trail. Moving to a distributed worker pool with Celery fixes this, but only when the configuration accounts for freight-specific failure modes. This guide walks the failure, its root cause, a reproducible diagnostic, and a production-grade resolution built for the async batch processing workflows that sit between document ingestion and rate validation.

Failure Definition

The reader hitting this page sees one or more of the following symptoms during peak carrier submission:

Workers are terminated by the Linux OOM killer mid-batch, leaving reconciliation incomplete.
The audit reconciliation report shows invoices missing despite the broker reporting successful acknowledgment.
Parser threads hang and soft_time_limit never fires because the work was never bounded.
Re-running a failed batch double-counts charges, corrupting the AP ledger.

Concretely, the failure surfaces as a kernel log line and a reconciliation gap:

[ 4821.33712] Out of memory: Killed process 21847 (celery) total-vm:6739216kB, anon-rss:6021140kB
audit.reconcile WARNING acknowledged=12840 persisted=12793 delta=47 (missing invoices)

A 47-invoice delta between broker acknowledgments and persisted records is the signature of a pipeline with no late-acknowledgment guarantee: the broker dropped the message the instant the worker died.

Root Cause Analysis

Three production conditions combine to produce this failure, and all three are configuration gaps rather than bugs in the parsing logic:

Root cause	Mechanism	Signal
Unbounded in-memory deserialization	The whole document (multi-page PDF, full EDI envelope) is loaded into RAM before any line item is written	RSS climbs linearly with batch size until the OOM killer fires
Early acknowledgment	The default `task_acks_late=False` acks the message before work completes, so a crash loses it	`delta` between acknowledged and persisted counts
Task hoarding	A high `worker_prefetch_multiplier` lets one worker reserve more heavy tasks than it can hold in memory	One worker OOMs while siblings sit idle

Parser drift compounds this: when a carrier updates a rate sheet or EDI 210/810 layout mid-batch, a previously cheap parse can suddenly allocate far more memory than the batch was sized for, pushing an already-marginal worker over the edge.

Reproducible Diagnostic

This minimal script reproduces the memory blow-up without Celery, so you can confirm the cause before changing the worker topology. It loads a batch the way a naive synchronous pipeline does and watches resident memory grow:

import os
import psutil


def reproduce_unbounded_growth(invoice_payloads: list) -> None:
    """Loads every payload into a single list to expose unbounded RSS growth."""
    process = psutil.Process(os.getpid())
    accumulated = []
    for idx, payload in enumerate(invoice_payloads):
        # Naive pipelines hold every parsed line item in memory at once.
        accumulated.append(parse_invoice(payload))
        if idx % 1000 == 0:
            rss_mb = process.memory_info().rss / (1024 * 1024)
            print(f"after {idx} invoices: RSS={rss_mb:.0f}MB")
    # On a real freight batch this print never reaches the end — the OOM
    # killer terminates the process while `accumulated` is still growing.

If RSS rises monotonically and the run dies before the final print, the pipeline lacks memory boundaries. To confirm the acknowledgment gap independently, kill a worker mid-task (kill -9 the PID) with task_acks_late=False and observe that the in-flight invoice never reappears in the queue — the broker considered it delivered.

Resolution Path

The fix has four stages: isolate heavy parsing onto its own queue, stream documents under a memory guard, run a hardened chunked task, and route every failure so no invoice is silently lost.

Stage 1 — Isolate parsing onto a dedicated queue

Separate heavy parsing from lightweight validation so a parser stall cannot starve the rest of the pipeline. Enforce late acknowledgment and a prefetch of one so a worker never reserves work it cannot fit in memory.

# celery_app.py
from celery import Celery
from django.conf import settings

app = Celery("freight_audit", broker=settings.CELERY_BROKER_URL)

app.conf.update(
    task_default_queue="rate.validate",          # light work is the default
    task_routes={
        "invoice.parse_batch": {"queue": "invoice.parse"},   # heavy work isolated
        "audit.reconcile": {"queue": "audit.reconcile"},
    },
    worker_prefetch_multiplier=1,                 # never hoard heavy tasks
    task_acks_late=True,                          # ack only after success
    task_reject_on_worker_lost=True,              # requeue on hard crash
    worker_max_tasks_per_child=100,               # recycle to bound leaks
)

Stage 2 — Stream documents under a memory guard

Freight documents are dense. Parse EDI 210/810 with a line-by-line segment reader and XML with lxml.iterparse(); extract PDF text in page chunks with pdfplumber. Wrap heavy deserialization in an explicit guard that fails fast rather than letting the kernel decide:

import os
import psutil


def enforce_memory_guard(threshold_mb: int = 256) -> None:
    """Raise MemoryError if worker RSS exceeds the per-worker budget."""
    process = psutil.Process(os.getpid())
    rss_mb = process.memory_info().rss / (1024 * 1024)
    if rss_mb > threshold_mb:
        raise MemoryError(f"Worker RSS {rss_mb:.1f}MB exceeds {threshold_mb}MB limit")

Stage 3 — Run the hardened chunked task

The task below processes invoices in bounded chunks, retries with an explicit backoff on memory pressure, and routes fatal failures to a dead-letter queue. It is designed to drop directly into a freight audit ETL system.

# tasks.py
import logging
import math
from typing import List

from celery_app import app
from guards import enforce_memory_guard
from pipeline import process_freight_chunk, route_to_dlq

logger = logging.getLogger("freight_audit.tasks")


@app.task(
    name="invoice.parse_batch",
    acks_late=True,
    soft_time_limit=300,
    time_limit=360,
    max_retries=3,
    bind=True,
)
def parse_invoice_batch(self, invoice_ids: List[str], chunk_size: int = 50) -> dict:
    """Process freight invoices in bounded chunks with memory-aware fallbacks."""
    total_chunks = math.ceil(len(invoice_ids) / chunk_size)
    processed = 0
    failed: List[str] = []

    for idx, start in enumerate(range(0, len(invoice_ids), chunk_size)):
        batch = invoice_ids[start : start + chunk_size]
        meta = {"chunk_index": idx, "total_chunks": total_chunks, "batch_size": len(batch)}

        try:
            enforce_memory_guard(threshold_mb=256)
            results = process_freight_chunk(batch)
            processed += len(results["success_ids"])
            failed.extend(results["failed_ids"])
            logger.info("chunk processed", extra={"event": "chunk.success", **meta})
        except MemoryError as exc:
            # Memory pressure: halve the chunk and retry the whole batch smaller.
            new_chunk = max(10, chunk_size // 2)
            logger.warning("memory guard tripped", extra={"event": "chunk.memory_guard", **meta})
            raise self.retry(
                exc=exc,
                countdown=120,                       # explicit backoff
                kwargs={"invoice_ids": invoice_ids, "chunk_size": new_chunk},
            )
        except Exception as exc:
            # Fatal parse defect: preserve the invoice, do not drop it.
            logger.error("chunk fatal", extra={"event": "chunk.fatal", "error": str(exc), **meta})
            route_to_dlq(batch, error=str(exc))
            failed.extend(batch)

    return {
        "status": "completed_with_failures" if failed else "completed",
        "processed_count": processed,
        "failed_ids": failed,
    }

Note: retry_backoff=True is not a valid argument to the @app.task decorator on its own here — exponential or fixed backoff is achieved by passing an explicit countdown to self.retry(), as shown above.

Stage 4 — Route every failure through a tiered fallback

When a parse fails, the invoice must never vanish. The task above implements three tiers that each preserve audit lineage:

Soft retry — transient carrier-API or network errors use Celery’s retry with an explicit countdown.
Chunk reduction — on memory pressure, chunk_size is halved and the batch retried, preventing repeated OOM kills while keeping throughput.
Dead-letter routing — on a fatal defect (malformed EDI segment, corrupted PDF, missing SCAC), the raw payload plus a correlation_id and failure_reason go to an invoice.dlq queue for analyst triage, not to /dev/null.

Verification

Confirm the fix with pytest-celery against a local Redis broker. The test asserts that a fatal chunk lands in the dead-letter queue and that re-processing a duplicate invoice does not double-count it:

def test_fatal_chunk_routes_to_dlq(celery_worker, dlq, monkeypatch):
    monkeypatch.setattr(
        "tasks.process_freight_chunk",
        lambda batch: (_ for _ in ()).throw(ValueError("malformed EDI segment")),
    )
    result = parse_invoice_batch.delay(["INV-1", "INV-2"]).get(timeout=10)

    assert result["status"] == "completed_with_failures"
    assert set(result["failed_ids"]) == {"INV-1", "INV-2"}
    assert dlq.depth() == 2                      # nothing lost


def test_duplicate_invoice_is_idempotent(db):
    parse_invoice_batch.delay(["INV-9"]).get(timeout=10)
    parse_invoice_batch.delay(["INV-9"]).get(timeout=10)  # replay
    assert AuditEntry.objects.filter(invoice_id="INV-9").count() == 1

A passing run gives three signals: peak worker RSS stays under the 256 MB budget, the broker acknowledgment count matches the persisted count exactly, and a chunk.fatal log line carries a correlation_id that resolves to a row in the dead-letter table.

Preventive Configuration

Stop the failure from recurring by gating it in CI and enforcing per-carrier guards in production:

Memory profiling in CI — run a synthetic batch and assert peak RSS per worker stays below 256 MB before merge.
Timeout boundary test — inject an artificial delay into process_freight_chunk() and assert soft_time_limit fires a graceful retry with no broker message loss.
Idempotency constraint — add a database unique constraint on (invoice_id, carrier_scac, billing_date) so a replayed batch cannot create a second audit entry.
Per-carrier circuit breaker — if a single SCAC trips more than five consecutive failures, pause its ingestion and alert the rate-contract team so one malformed submission cannot exhaust the pool.

For observability, attach a correlation_id, carrier_scac, event_type, and processing_duration_ms to every structured log line, export celery_task_retries_total and celery_queue_length to Prometheus, and alert when retry rate exceeds 10% or queue depth exceeds 5000. Clean structured results are what the downstream rule-based rate validation stage consumes to score charges and route disputes.

FAQ

Why does `task_acks_late=True` matter specifically for freight audit batches?

Because an unacknowledged-until-complete message is the only way to guarantee that a worker crash does not silently drop an invoice. With early acknowledgment, a SIGKILL during parsing makes the broker treat the invoice as delivered, producing the acknowledged-vs-persisted delta that breaks audit reconciliation. Pairing task_acks_late=True with task_reject_on_worker_lost=True requeues the in-flight work instead.

Should I use asyncio instead of Celery for invoice batch processing?

For CPU- and memory-bound parsing of large EDI and PDF documents, a process-based worker pool isolates memory per task and lets the OS reclaim a leaked or oversized worker via worker_max_tasks_per_child. A single-process asyncio loop cannot do that — one runaway parse takes the whole event loop down. Reach for asyncio when the bottleneck is network I/O (carrier API fan-out), and for Celery when it is document deserialization.

How do I stop one bad carrier from exhausting the worker pool?

Track retry rate per SCAC and trip a circuit breaker after a threshold of consecutive failures, pausing ingestion for that carrier only. Combined with worker_prefetch_multiplier=1, this stops a single malformed rate sheet from cycling every worker through repeated OOM kills.

What happens to invoices that fail every retry tier?

They are routed to the invoice.dlq queue with a correlation_id and failure_reason rather than discarded. An analyst dashboard reads that queue so each defect is triaged and re-submitted, keeping the audit trail complete end to end.

Async Batch Processing Workflows — the stage this task implements, with the full field contract it must honour.
Automating EDI 210 Freight Bill Extraction Workflows — stateful, audit-safe parsing for the documents these workers consume.
Parsing Carrier PDF Invoices with pdfplumber — page-chunked PDF extraction that keeps per-worker memory bounded.
Converting XML Carrier Invoices to pandas DataFrames — streaming XML normalization upstream of the batch layer.
Rule-Based Rate Validation & Accessorial Auditing — the downstream tier that consumes the structured results of these tasks.

Up one level: Async Batch Processing Workflows · Section: Automated Invoice Parsing & EDI/XML Ingestion