How to Map LTL Class Rates to JSON Schemas

This page resolves the failure where a carrier’s LTL class/weight rate grid maps into a JSON schema that looks valid but silently corrupts freight classes, drops weight breaks, or exhausts memory on a bulk contract load.

The Failure You Are Hitting

You wrote a mapper that turns a parsed LTL rate sheet into typed JSON records and ships it into the LTL Rate Sheet Digitization stage. It passes on the carrier sample, then degrades in production in one of three observable ways:

A row with NMFC class 77.5 or 92.5 lands in the rate store as class 77 or 92. The schema typed freight_class as int, so the fractional class was truncated at coercion time — every downstream lookup for that class now misses or matches the wrong tier.
The mapper accepts the payload, but rates are off by one weight break: the 1M (1,000 lb) column value shows up under 5M, because a merged header cell shifted the column index by one and nothing asserted column alignment.
A 500k-row national tariff drives container RSS past its limit and the worker is OOMKilled mid-load, leaving the rate dataset half-written and the contract version neither old nor new.

Unlike the structured handoff from EDI 210/810 processing, an LTL rate sheet carries no field semantics — the class/weight grid is positional. Any deviation from the layout you sampled produces wrong-but-plausible JSON instead of an exception, and the gap surfaces months later during carrier reconciliation.

Root Cause Analysis

These failures are rarely bugs in the serializer. They trace to four conditions a one-carrier sample never exercises:

Lossy type coercion on fractional classes. The NMFC has fractional tiers — 77.5, 92.5, 110 is whole but 77.5 is not. Typing freight_class as int makes Pydantic (or any coercer) floor the value; Decimal or str is required to preserve it.
Positional column drift. Carriers publish the weight-break columns (L5C, M5C, 1M, 5M, 10M, 20M, 30M) in different orders and merge the header band. A mapper keyed by column index instead of a normalized header label silently misassigns rates.
Whole-grid materialization. Pivoting a national tariff into a {class: {weight_break: rate}} matrix in one pass holds every row plus the pivot frame in memory at once, so a large contract OOMs where a regional one fits.
No structural assertion at the schema boundary. A model that coerces freely (extra="ignore", non-strict) accepts extra keys, missing breaks, and string rates as-is, so a malformed sheet validates instead of dead-lettering.

Reproducible Diagnostic

Before changing any mapping code, confirm which failure you have. This snippet exposes the three signals that distinguish truncation from column drift from a size problem:

import json
from decimal import Decimal

with open("carrier_ltl_rows.json", "r", encoding="utf-8") as f:
    rows = json.load(f)

classes = {str(r["freight_class"]) for r in rows}
header_keys = sorted({k for r in rows for k in r})
print(f"distinct_classes={sorted(classes)}")
print(f"row_keys={header_keys}")
print(f"row_count={len(rows)} approx_mb={len(json.dumps(rows)) / 1_048_576:.1f}")

Read the output like a decision tree:

Signal	Likely cause	Where to fix
classes contain `77`/`92` but never `77.5`/`92.5`	int truncation upstream	type as `Decimal` (Step 2)
`row_keys` differ between carriers or omit a break	positional column drift	normalize headers (Step 1)
`approx_mb` high and load `OOMKilled`	whole-grid materialization	stream in chunks (Step 3)
rates present but shifted one break over	merged-header index shift	header map + alignment assert (Step 1)

If distinct_classes already shows whole numbers only on a carrier you know publishes 77.5, the loss happened before this mapper — the upstream parser typed it as int and you must fix that coercion, not the JSON layout.

Resolution Path

The fix is a four-step mapper: normalize the carrier’s column labels to canonical weight-break keys, enforce a strict schema that keeps fractional classes, stream rows in chunks into the pivot, and dead-letter anything that fails. Pin dependencies first so CI and production agree exactly:

# requirements.txt
pydantic==2.10.6
pandas==2.2.3
pyarrow==18.1.0

Step 1 — Normalize carrier column labels to canonical keys

Map every carrier’s weight-break header onto one canonical vocabulary, and assert the expected set is present so a shifted or missing column fails loudly instead of silently:

# Canonical weight-break keys (lower bound, in lb) used across the rate store.
CANONICAL_BREAKS = ("L5C", "M5C", "1M", "2M", "5M", "10M", "20M", "30M")

# Each carrier ships its own labels for the same breaks; resolve them here.
CARRIER_HEADER_MAP = {
    "less than 500": "L5C", "500-999": "M5C", "1000-1999": "1M",
    "2000-4999": "2M", "5000-9999": "5M", "10000-19999": "10M",
    "20000-29999": "20M", "30000+": "30M",
}

def normalize_breaks(raw_headers: list[str]) -> dict[str, str]:
    """Return {raw_label: canonical_key}; raise if a break is unmapped or missing."""
    resolved = {}
    for h in raw_headers:
        key = CARRIER_HEADER_MAP.get(h.strip().lower())
        if key:
            resolved[h] = key
    missing = set(CANONICAL_BREAKS) - set(resolved.values())
    if missing:
        raise ValueError(f"unmapped or missing weight breaks: {sorted(missing)}")
    return resolved

Step 2 — Enforce a strict schema that preserves fractional classes

Type freight_class as Decimal and validate it against the real NMFC tier set so 77.5 survives and an off-list class dead-letters. strict=True plus extra="forbid" stops schema poisoning, and frozen=True guarantees immutability after validation:

from decimal import Decimal
from datetime import date
from pydantic import BaseModel, Field, ConfigDict, field_validator

# NMFC tiers include fractional classes; storing as Decimal keeps 77.5 intact.
NMFC_CLASSES = {
    Decimal(c) for c in (
        "50", "55", "60", "65", "70", "77.5", "85", "92.5", "100", "110",
        "125", "150", "175", "200", "250", "300", "400", "500",
    )
}

class LTLClassRate(BaseModel):
    model_config = ConfigDict(strict=True, extra="forbid", frozen=True)

    carrier_scac: str = Field(..., pattern=r"^[A-Z0-9]{4}$")
    contract_version: str = Field(..., pattern=r"^v\d+\.\d+$")
    origin_zip3: str = Field(..., pattern=r"^\d{3}$")
    dest_zip3: str = Field(..., pattern=r"^\d{3}$")
    freight_class: Decimal
    weight_break: str
    rate_per_cwt: Decimal = Field(..., ge=0)
    effective_date: date

    @field_validator("freight_class", mode="before")
    @classmethod
    def keep_fractional_class(cls, v) -> Decimal:
        # Coerce via str so a float 77.5 never reaches int and floor-truncates.
        d = Decimal(str(v))
        if d not in NMFC_CLASSES:
            raise ValueError(f"invalid NMFC class: {d}")
        return d

    @field_validator("weight_break")
    @classmethod
    def known_break(cls, v: str) -> str:
        if v not in CANONICAL_BREAKS:
            raise ValueError(f"unknown weight break: {v}")
        return v

Step 3 — Stream rows in chunks and pivot incrementally

Never materialize the whole tariff. Validate in fixed windows, separate failures from valid records, and fold valid rows into a nested {class: {break: rate}} matrix one chunk at a time so peak memory reflects the chunk, not the contract:

from collections import defaultdict
from typing import Iterator, Iterable
from pydantic import TypeAdapter, ValidationError

CHUNK_SIZE = 5_000  # tuned for a flat sub-250 MB heap on a standard container
_adapter = TypeAdapter(LTLClassRate)

def chunked(rows: Iterable[dict], size: int = CHUNK_SIZE) -> Iterator[list[dict]]:
    buf: list[dict] = []
    for row in rows:
        buf.append(row)
        if len(buf) >= size:
            yield buf
            buf = []
    if buf:
        yield buf

def map_to_matrix(rows: Iterable[dict]) -> tuple[dict, list]:
    """Fold valid rows into a class x weight-break matrix; collect failures."""
    matrix: dict = defaultdict(dict)
    failures: list[dict] = []
    for chunk in chunked(rows):
        for record in chunk:
            try:
                r = _adapter.validate_python(record)
            except ValidationError as exc:
                failures.append({"payload": record, "error": exc.errors()})
                continue
            lane = f"{r.origin_zip3}-{r.dest_zip3}"
            matrix[(lane, str(r.freight_class))][r.weight_break] = r.rate_per_cwt
    return dict(matrix), failures

Step 4 — Dead-letter failures and emit a hashed, versioned dataset

Write malformed rows to a dead-letter queue for manual reconciliation, then serialize the matrix with a content hash so the rate store can pin the exact contract version it loaded — the same provenance discipline that matching shipment lanes to contracted rate tables relies on downstream:

import json
import hashlib
from pathlib import Path
from datetime import date

def write_dlq(failures: list[dict], contract_id: str, dlq_dir: Path) -> None:
    if not failures:
        return
    dlq_dir.mkdir(parents=True, exist_ok=True)
    path = dlq_dir / f"{contract_id}_{date.today().isoformat()}.jsonl"
    with open(path, "a", encoding="utf-8") as f:
        for rec in failures:
            f.write(json.dumps(rec, default=str) + "\n")

def serialize_matrix(matrix: dict, out_dir: Path, contract_version: str) -> str:
    out_dir.mkdir(parents=True, exist_ok=True)
    # Keys are tuples; flatten to a JSON-safe, deterministically ordered payload.
    flat = {f"{lane}|{cls}": {b: str(v) for b, v in breaks.items()}
            for (lane, cls), breaks in sorted(matrix.items())}
    blob = json.dumps(flat, sort_keys=True).encode("utf-8")
    digest = hashlib.sha256(blob).hexdigest()
    (out_dir / f"ltl_rates_{contract_version}_{digest[:12]}.json").write_bytes(blob)
    return digest

Verification

Confirm each failure is closed rather than hidden. These assertions belong in the integration suite that runs on every new carrier template:

from decimal import Decimal

def test_fractional_class_survives():
    r = _adapter.validate_python({
        "carrier_scac": "ABCD", "contract_version": "v1.0",
        "origin_zip3": "300", "dest_zip3": "750",
        "freight_class": "77.5", "weight_break": "1M",
        "rate_per_cwt": "42.18", "effective_date": "2026-01-01",
    })
    assert r.freight_class == Decimal("77.5")  # not 77

def test_float_class_not_truncated():
    r = _adapter.validate_python({
        "carrier_scac": "ABCD", "contract_version": "v1.0",
        "origin_zip3": "300", "dest_zip3": "750",
        "freight_class": 92.5, "weight_break": "5M",
        "rate_per_cwt": "31.05", "effective_date": "2026-01-01",
    })
    assert r.freight_class == Decimal("92.5")

def test_unmapped_break_raises():
    import pytest
    with pytest.raises(ValueError):
        normalize_breaks(["less than 500", "500-999"])  # rest of breaks missing

In production, the proof is in the dead-letter queue: a healthy mapper writes zero or a handful of DLQ rows on a known-good carrier. A spike in invalid NMFC class or unmapped or missing weight breaks errors means the carrier changed their sheet layout — investigate the template and extend CARRIER_HEADER_MAP, do not relax the schema.

Preventive Configuration

Stop the regression from returning by encoding these as configuration, not tribal knowledge:

Per-carrier header maps. Keep a SCAC -> CARRIER_HEADER_MAP registry so each carrier resolves to the column vocabulary proven against its sheet, instead of one global map that silently rots when a carrier re-labels a break.
Decimal everywhere for money and class. Forbid float and int for freight_class, rate_per_cwt, and any surcharge field at the schema level so truncation cannot reappear — the same exact-arithmetic rule that calculating dynamic fuel surcharges depends on.
CI schema gate. Run map_to_matrix() against a golden fixture in CI and assert the failure list is empty and the matrix contains every expected (lane, class) key, so a malformed template fails the build, not the night load.
Version-pinned datasets. Emit the SHA-256 digest with every load and reject a write whose contract_version does not increment, so the weight and zone cross-validation tier always reads a deterministic, traceable matrix.

FAQ

Why does my freight class 77.5 end up as 77 in the rate store?

The schema typed freight_class as int, so the coercer floored the fractional value at validation time. Type it as Decimal and coerce through str (Decimal(str(v))), as in Step 2, so 77.5 and 92.5 survive intact and are validated against the real NMFC tier set.

Should freight_class be a string or a Decimal in the JSON schema?

Use Decimal for validation and arithmetic, then serialize as a string ("77.5") in the stored JSON so no float ever re-enters the value. Strings alone work for storage but lose the ordering and membership checks that Decimal gives you against the NMFC tier set.

How do I keep a 500k-row national tariff from OOMing the worker?

Do not build the full pivot in one pass. Validate rows in fixed chunks and fold each chunk into the matrix incrementally (Step 3), so peak memory reflects one CHUNK_SIZE window rather than the whole contract. Write the serialized result once at the end.

What handles a row that fails validation — does the load stop?

No. Failing rows route to a dead-letter queue (Step 4) while valid rows continue into the matrix. This keeps the load going and preserves the exact malformed payload for reconciliation instead of halting a 500k-row batch on one bad record.

LTL Rate Sheet Digitization — the parent stage this mapper plugs into.
Extracting FTL Zone-Based Pricing from Carrier PDFs — the flat-rate sibling with a simpler grid but the same provenance discipline.
Calculating Dynamic Fuel Surcharges with Python Formulas — where the Decimal-only rule for rate fields continues.
Matching Shipment Lanes to Contracted Rate Tables — the downstream consumer of the matrix this mapper emits.
Cross-Checking Billable Weight Against Actual Weight Logs — where weight-break assignment is validated against shipment reality.

Up one level: LTL Rate Sheet Digitization · Section: Freight Contract Architecture & Rate Mapping