Weight & Zone Cross-Validation: Implementation Guide for Freight Audit Pipelines

Weight & Zone Cross-Validation functions as the deterministic reconciliation layer within modern freight audit architectures. Positioned strictly downstream of raw invoice normalization and upstream of dispute routing, this stage isolates pricing anomalies before they reach payment workflows. By independently computing expected zones, resolving billable weight brackets, and querying contracted rate tables, the engine enforces mathematical consistency between carrier billing logic and shipper agreements. This validation layer operates as a foundational component of the broader Rule-Based Rate Validation & Accessorial Auditing framework, ensuring that base freight charges are mathematically sound before downstream modules evaluate surcharges or penalties.

Canonical Schema & Contract Table Preparation

The validation engine requires strictly typed, normalized shipment records. Upstream ETL processes must flatten heterogeneous carrier payloads (EDI 210 segments, carrier API JSON, or OCR-extracted PDFs) into a unified schema. The following Pydantic model enforces type safety and nullability constraints at the ingestion boundary:

from pydantic import BaseModel, Field, field_validator
from typing import Optional
import re

class CanonicalShipment(BaseModel):
    shipment_id: str
    carrier_scac: str
    origin_zip: str = Field(pattern=r"^\d{5}$")
    dest_zip: str = Field(pattern=r"^\d{5}$")
    billed_weight_lbs: float
    actual_weight_lbs: Optional[float] = None
    dim_length_in: Optional[float] = None
    dim_width_in: Optional[float] = None
    dim_height_in: Optional[float] = None
    service_level: str
    billed_zone: Optional[int] = None
    billed_freight_charge: float
    contract_id: str

    @field_validator("origin_zip", "dest_zip")
    @classmethod
    def validate_zip(cls, v: str) -> str:
        if not re.match(r"^\d{5}$", v):
            raise ValueError("ZIP code must be exactly 5 digits")
        return v

Contract rate tables must be pre-loaded into a columnar, query-optimized store. DuckDB or Parquet-backed data lakes are recommended for sub-second lookups across millions of rate combinations. Each table must map weight_bracket, zone, and service_level to a base rate, alongside fuel surcharge multipliers and minimum charge floors. All monetary values should be stored as DECIMAL(19,4) to prevent floating-point drift during aggregation, per Python’s decimal documentation.

Deterministic Zone Resolution

Carrier zone assignments are rarely static. Parcel networks rely on annual zip-to-zone grid updates, while LTL carriers utilize distance-based matrices and freight class routing rules. The pipeline must independently derive the expected zone before comparing it to the carrier’s billed value.

Zone resolution begins with a direct lookup against the carrier’s published zip-pair table. When a direct mapping fails, the engine falls back to Lane Matching Algorithms that compute zones via centroid distance, regional grouping, or state-to-state routing tables. The resolved zone must then be validated against service-level constraints (e.g., Ground services capped at Zone 8, Express services capped at Zone 10).

import duckdb
from dataclasses import dataclass
from typing import Optional

@dataclass
class ZoneResolutionResult:
    resolved_zone: int
    resolution_method: str  # "direct", "centroid_fallback", "state_group"
    is_valid_for_service: bool

class ZoneResolver:
    def __init__(self, duckdb_conn: duckdb.DuckDBPyConnection):
        self.conn = duckdb_conn

    def resolve(self, origin_zip: str, dest_zip: str, service_level: str) -> ZoneResolutionResult:
        # Direct lookup
        query = """
            SELECT zone FROM carrier_zone_grid 
            WHERE origin_zip = ? AND dest_zip = ?
        """
        result = self.conn.execute(query, [origin_zip, dest_zip]).fetchone()
        
        if result:
            return self._validate_service(result[0], service_level, "direct")
            
        # Fallback to centroid/state logic (delegated to lane matching module)
        fallback_zone = self._compute_fallback_zone(origin_zip, dest_zip)
        return self._validate_service(fallback_zone, service_level, "centroid_fallback")

    def _validate_service(self, zone: int, service: str, method: str) -> ZoneResolutionResult:
        service_caps = {"GROUND": 8, "EXPRESS": 10, "FREIGHT": 12}
        cap = service_caps.get(service.upper(), 12)
        return ZoneResolutionResult(
            resolved_zone=zone,
            resolution_method=method,
            is_valid_for_service=zone <= cap
        )

    def _compute_fallback_zone(self, origin: str, dest: str) -> int:
        # Placeholder for centroid distance or state-grouping logic
        # In production, this delegates to the lane matching pipeline
        return 5  # Default safe zone for demonstration

Weight Bracket & Dimensional Logic

Billable weight is rarely the raw scale weight. Carriers apply dimensional weight formulas ((L × W × H) / divisor) and snap results to predefined weight brackets. The pipeline must replicate this logic deterministically.

When dimensional data is present, the engine calculates the dimensional weight and compares it against the scale weight. The higher value becomes the billable weight. This value is then mapped to the nearest contracted weight bracket (e.g., 1-50, 51-100, 101-150 lbs). Tolerance thresholds (typically ±1.0 lb or ±2%) are applied to account for carrier scale calibration variances. Detailed methodologies for handling scale discrepancies are documented in Cross-checking billable weight against actual weight logs.

from typing import Optional

def calculate_billable_weight(
    actual: Optional[float],
    dims: tuple[Optional[float], Optional[float], Optional[float]],
    dim_divisor: int = 166,
    bracket_step: int = 50
) -> int:
    if actual is None:
        raise ValueError("Actual weight is required for billable weight calculation")
        
    dim_weight = 0.0
    if all(d is not None for d in dims):
        l, w, h = dims
        dim_weight = (l * w * h) / dim_divisor
        
    raw_billable = max(actual, dim_weight)
    
    # Snap to contracted bracket
    snapped = int(((raw_billable - 0.01) // bracket_step) * bracket_step + bracket_step)
    return snapped

Contract Rate Reconciliation Engine

With the resolved zone and snapped weight bracket established, the engine queries the contract rate table to derive the expected base charge. This expected value is then compared to the billed_freight_charge from the canonical schema. Variance is calculated as a percentage and absolute delta.

This stage strictly validates base freight charges. It does not parse, score, or validate accessorial fees; those are routed to Accessorial Charge Scoring to maintain strict separation of concerns.

def reconcile_rate(
    conn: duckdb.DuckDBPyConnection,
    weight_bracket: int,
    zone: int,
    service_level: str,
    contract_id: str,
    billed_charge: float
) -> dict:
    query = """
        SELECT base_rate, fuel_surcharge_pct, min_charge
        FROM contract_rates
        WHERE contract_id = ? 
          AND weight_bracket = ? 
          AND zone = ? 
          AND service_level = ?
    """
    row = conn.execute(query, [contract_id, weight_bracket, zone, service_level]).fetchone()
    
    if not row:
        raise LookupError(f"No contract rate found for {contract_id} | {weight_bracket} | {zone}")
        
    base, fuel_pct, min_charge = row
    expected = Decimal(str(base)) * (Decimal("1.0") + Decimal(str(fuel_pct)) / Decimal("100"))
    expected = max(expected, Decimal(str(min_charge)))
    expected = expected.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)
    
    billed = Decimal(str(billed_charge))
    variance_abs = abs(billed - expected)
    variance_pct = (variance_abs / expected * Decimal("100")).quantize(Decimal("0.01"))
    
    return {
        "expected_charge": float(expected),
        "variance_abs": float(variance_abs),
        "variance_pct": float(variance_pct),
        "status": "PASS" if variance_abs <= Decimal("0.50") else "FLAG"
    }

Pipeline Boundaries & Error Handling Strategy

Maintaining strict stage boundaries prevents logic bleed and ensures predictable audit trails. This module explicitly excludes:

  • Invoice Parsing/Normalization: Handled upstream.
  • Accessorial/Stop Charge Validation: Routed downstream.
  • Dispute Ticket Generation: Handled by the dispute routing engine.

Error Handling & Routing

The validation engine implements a tiered failure strategy:

  1. Missing Contract/Rate: Records are tagged STATUS: CONTRACT_MISSING and routed to a contract reconciliation queue. No financial variance is calculated.
  2. Unresolvable Zone: If both direct and fallback lookups fail, the record is tagged STATUS: ZONE_UNRESOLVED and logged with origin/dest metadata for manual review.
  3. Data Type/Schema Violations: Caught at the Pydantic boundary. Records are rejected immediately and routed to a dead-letter queue (DLQ) with validation error payloads.
  4. Tolerance Exceeded: Records passing schema and contract checks but exceeding variance thresholds are tagged STATUS: RATE_VARIANCE and passed to the dispute routing layer with full audit metadata.

All errors are logged with structured JSON payloads containing shipment_id, carrier_scac, failure_code, and stack_trace. Retry logic is disabled for deterministic validation failures; only transient infrastructure errors (e.g., DuckDB connection timeouts) trigger exponential backoff. The output schema is strictly flattened to ensure downstream consumers receive only validated, enriched records ready for financial routing.