Accessorial Charge Scoring: Implementation Guide

Accessorial charge scoring is the deterministic validation stage that decides whether each accessorial line on a carrier invoice — detention, liftgate, residential delivery, redelivery, limited-access pickup — is contractually defensible before any money moves. It consumes normalized accessorial records, applies version-controlled rule matrices, and emits a single confidence_score (0.0–1.0) plus a routing verdict that downstream stages act on without re-deriving the math. This stage sits inside Rule-Based Rate Validation & Accessorial Auditing, immediately after lane resolution and weight reconciliation, and exists to convert messy carrier accessorial billing into auditable, dispute-ready records using rules and arithmetic only — no probabilistic models, no external API calls, no per-invoice human judgment.

This guide covers where the scoring stage begins and ends, the data contract it expects, the YAML threshold matrices that drive it, a step-by-step deterministic engine, the test patterns that keep it honest, its failure modes under partial data, and how its output feeds threshold alerting and dispute routing.

Prerequisites

Scoring is not a standalone script; it is a stage with hard input expectations. Everything below must be satisfied upstream or the stage will quarantine records rather than guess.

Dependency	Type	Why it is required
Normalized accessorial records	Upstream component	Raw EDI 210, carrier-portal JSON, and OCR’d PDFs must already be parsed and canonicalized by Automated Invoice Parsing & EDI/XML Ingestion. Scoring never touches raw documents.
Resolved `contract_id` + lane keys	Data contract	Origin/destination must be resolved by Lane Matching Algorithms so the correct accessorial tariff is selected.
Reconciled billable weight	Data contract	Weight-driven accessorials (e.g. over-length, excessive weight) require the verdict from Weight & Zone Cross-Validation.
Canonical accessorial codes	Reference data	Carrier-specific codes must be folded to a standard taxonomy via Accessorial Charge Taxonomy Mapping before a profile can be matched.
`pydantic>=2.0`, `PyYAML>=6.0`	Python dependency	Schema enforcement at the boundary and YAML profile loading.
`accessorial_profiles.yaml`	Config key	Version-controlled threshold matrix, deployed read-only via GitOps.

If any of these are absent, the correct behaviour is to route to QUARANTINE with an explicit reason — never to score against defaults and emit a false APPROVE.

Pipeline Architecture & Stage Boundaries

Strict stage isolation prevents logic bleed and keeps execution idempotent. The scoring stage begins only after accessorial records have been normalized into the canonical schema below, and it ends immediately before dispute routing or payment reconciliation. It does not parse documents, synchronize carrier master data, or open dispute tickets — those concerns belong to neighbouring stages.

Inbound contract: normalized accessorial records with validated types, standardized units, enriched lane metadata, and a bound contract_id. Outbound contract: a scored payload carrying confidence_score (0.0–1.0), routing_flag (APPROVE, REVIEW, QUARANTINE), and a structured score_breakdown that records every rule that fired.

The mapping between an inbound field and the rule it powers is fixed, so reviewers can trace any score back to its inputs:

Inbound field	Drives	Rule consuming it
`accessorial_code`	Profile selection	Maps to a `accessorial_profiles` entry, or the `UNKNOWN_ACCESSORIAL` fallback
`billed_amount`	Cap validation	Compared against `contractual_cap_per_hour` / `flat_rate_max` plus tolerance
`trigger_event` / `trigger_value`	Trigger completeness	Checked against `required_triggers`
`uom`	Unit reconciliation	Ensures the cap is compared on the same basis (`per_hour`, `flat`, `per_cwt`)
`contract_id`	Profile versioning	Pins which tariff snapshot the profile was drawn from

Data Contract & Schema Enforcement

The scoring engine expects a rigidly typed accessorial payload. Schema validation runs at the pipeline boundary using Pydantic. Records that violate type constraints, omit mandatory fields, or carry negative monetary values are quarantined before scoring executes, so the engine itself only ever sees well-formed input.

Canonical accessorial schema:

{
  "shipment_id": "string (UUID)",
  "carrier_scac": "string (4-char)",
  "accessorial_code": "string (carrier or NMFC standard)",
  "accessorial_description": "string",
  "billed_amount": "decimal(10,2)",
  "uom": "string (enum: 'per_stop', 'per_cwt', 'flat', 'per_hour')",
  "trigger_event": "string (e.g., 'detention_hours', 'residential_delivery')",
  "trigger_value": "decimal(8,2)",
  "lane_origin_zip": "string (5-char)",
  "lane_dest_zip": "string (5-char)",
  "actual_weight_lbs": "decimal(8,1)",
  "billing_zone": "string",
  "invoice_date": "date",
  "raw_source_payload": "json"
}

Financial precision is non-negotiable. Every monetary and weight calculation uses fixed-point arithmetic via the Python decimal module (Python Decimal documentation); binary floats are never used to compare a billed amount against a contractual cap.

Configuration Schema & Threshold Matrices

Scoring logic is data, not code. Each accessorial type maps to a version-controlled YAML profile holding base validity weights, contractual caps, deviation tolerances, and the operational triggers that must be present for the charge to be legitimate.

accessorial_profiles:
  DETENTION:
    base_score: 1.0
    contractual_cap_per_hour: 75.00
    deviation_tolerance_pct: 15.0
    required_triggers: ["dock_in_time", "dock_out_time", "free_time_minutes"]
    fallback_score: 0.6
  LIFTGATE:
    base_score: 0.95
    flat_rate_max: 85.00
    residential_override: true
    required_triggers: ["delivery_type"]
    fallback_score: 0.75

Configuration loading must implement schema validation, atomic file replacement, and circuit-breaker fallbacks so a malformed profile deployment degrades to a conservative default instead of halting the pipeline mid-batch. The boundary between what counts as a known accessorial and what its ceiling is is owned here and in Accessorial Charge Taxonomy Mapping; the scoring stage only reads the resolved matrix.

Step-by-Step Scoring Implementation

The engine evaluates each record against its profile in a fixed sequence. Missing data lowers the score deterministically rather than raising an exception, which keeps the stage running through partial carrier outages.

Stage 1 — Validate at the boundary and select a profile

Parse the inbound record with Pydantic, then resolve its profile. An unmatched code falls back to a conservative profile that forces human review rather than silent approval.

import logging
from decimal import Decimal, ROUND_HALF_UP
from typing import Any, Dict, Optional

import yaml
from pydantic import BaseModel, Field

logger = logging.getLogger(__name__)

UNKNOWN_PROFILE = {"base_score": 0.5, "required_triggers": [], "fallback_score": 0.5}


class AccessorialRecord(BaseModel):
    shipment_id: str
    carrier_scac: str
    accessorial_code: str
    billed_amount: Decimal
    uom: str
    trigger_event: str
    trigger_value: Optional[Decimal] = None
    lane_origin_zip: str
    lane_dest_zip: str
    billing_zone: str


def load_scoring_profiles(config_path: str) -> Dict[str, Any]:
    """Load and validate the YAML matrix; raise loudly on a bad deploy."""
    with open(config_path, "r", encoding="utf-8") as fh:
        parsed = yaml.safe_load(fh) or {}
    return parsed.get("accessorial_profiles", {})


def select_profile(record: AccessorialRecord, profiles: Dict[str, Any]) -> Dict[str, Any]:
    """Exact-code match, else the conservative UNKNOWN fallback."""
    return profiles.get(record.accessorial_code, UNKNOWN_PROFILE)

Common mistake: matching profiles on the carrier’s raw code instead of the canonical code. Carrier “DTN”, “DET”, and “DETEN” all mean detention; if taxonomy folding has not run upstream, every one of them silently drops to UNKNOWN_ACCESSORIAL and floods the review queue.

Stage 2 — Enforce the contractual cap with tolerance

Compare the billed amount against the relevant ceiling on the same unit basis, allowing a configured deviation band before penalizing.

def apply_cap_rule(
    record: AccessorialRecord, profile: Dict[str, Any], breakdown: Dict[str, str]
) -> Decimal:
    """Return the score delta from cap evaluation (<= 0)."""
    cap_key = "contractual_cap_per_hour" if "contractual_cap_per_hour" in profile else "flat_rate_max"
    cap_val = profile.get(cap_key, 0)
    cap = Decimal(str(cap_val)) if cap_val else Decimal("0")

    if cap <= 0:
        breakdown["cap"] = "no cap configured"
        return Decimal("0")

    if record.billed_amount <= cap:
        breakdown["cap"] = "within limit"
        return Decimal("0")

    overage = record.billed_amount - cap
    tolerance = Decimal(str(profile.get("deviation_tolerance_pct", 0))) / Decimal("100")
    allowed = cap * tolerance

    if overage > allowed:
        breakdown["cap"] = f"exceeds cap by {overage} (tolerance {tolerance * 100}%)"
        return Decimal("-0.4")

    breakdown["cap"] = "within tolerance band"
    return Decimal("0")

Common mistake: comparing a per_hour cap against a flat-billed line. Always reconcile uom first; a $300 flat detention charge is not three hours over a $75/hr cap.

Stage 3 — Verify trigger completeness

An accessorial is only legitimate when its triggering event is present in the operational data. Each missing trigger applies a fixed penalty rather than an all-or-nothing failure.

def trigger_is_met(record: AccessorialRecord, trigger: str) -> bool:
    """Deterministic trigger resolution. Extend per operational reality."""
    resolved = {
        "dock_in_time": record.trigger_value is not None and record.trigger_value > 0,
        "dock_out_time": record.trigger_value is not None and record.trigger_value > 0,
        "free_time_minutes": record.trigger_value is not None,
        "delivery_type": record.trigger_event in ("residential_delivery", "limited_access"),
    }
    return resolved.get(trigger, False)


def apply_trigger_rule(
    record: AccessorialRecord, profile: Dict[str, Any], breakdown: Dict[str, str]
) -> Decimal:
    required = profile.get("required_triggers", [])
    missing = [t for t in required if not trigger_is_met(record, t)]
    if not missing:
        breakdown["triggers"] = "all required triggers present"
        return Decimal("0")
    breakdown["triggers"] = f"missing: {', '.join(missing)}"
    return Decimal("-0.15") * len(missing)

Common mistake: treating a missing trigger as a hard reject. Carrier EDI frequently omits dock timestamps during portal outages; a deterministic penalty keeps a genuine charge in REVIEW instead of wrongly quarantining it.

Stage 4 — Compose, clamp, and route

Combine the deltas, clamp into [0.0, 1.0], quantize to two decimals, and map the result to a routing verdict.

class ScoredPayload(BaseModel):
    original: AccessorialRecord
    confidence_score: Decimal = Field(ge=0, le=1)
    routing_flag: str  # APPROVE, REVIEW, QUARANTINE
    score_breakdown: Dict[str, str]
    applied_profile: str


def route_for_score(score: Decimal) -> str:
    if score >= Decimal("0.85"):
        return "APPROVE"
    if score >= Decimal("0.60"):
        return "REVIEW"
    return "QUARANTINE"


def calculate_accessorial_score(
    record: AccessorialRecord, profiles: Dict[str, Any]
) -> ScoredPayload:
    """Deterministic, idempotent scoring with strict boundary enforcement."""
    profile = select_profile(record, profiles)
    breakdown: Dict[str, str] = {}

    score = Decimal(str(profile.get("base_score", 0.5)))
    score += apply_cap_rule(record, profile, breakdown)
    score += apply_trigger_rule(record, profile, breakdown)

    final_score = max(
        Decimal("0.0"),
        min(Decimal("1.0"), score.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)),
    )
    return ScoredPayload(
        original=record,
        confidence_score=final_score,
        routing_flag=route_for_score(final_score),
        score_breakdown=breakdown,
        applied_profile=record.accessorial_code,
    )

Common mistake: clamping before quantizing, or quantizing with the default rounding mode. Pin ROUND_HALF_UP so a borderline 0.845 rounds identically on every replay — reproducibility is what makes the score defensible in a dispute.

Validation & Testing

Because the engine is pure, every rule is unit-testable against a fixed input with no mocks. Build fixtures from real carrier edge cases and assert on both the verdict and the breakdown, so a regression in why a score moved is caught alongside a regression in the number.

import pytest
from decimal import Decimal


def make_record(**overrides) -> AccessorialRecord:
    base = dict(
        shipment_id="11111111-1111-1111-1111-111111111111",
        carrier_scac="ABCD",
        accessorial_code="DETENTION",
        billed_amount=Decimal("75.00"),
        uom="per_hour",
        trigger_event="detention_hours",
        trigger_value=Decimal("2.0"),
        lane_origin_zip="30301",
        lane_dest_zip="60601",
        billing_zone="Z3",
    )
    base.update(overrides)
    return AccessorialRecord(**base)


PROFILES = {
    "DETENTION": {
        "base_score": 1.0,
        "contractual_cap_per_hour": 75.00,
        "deviation_tolerance_pct": 15.0,
        "required_triggers": ["dock_in_time", "dock_out_time"],
    }
}


def test_within_cap_and_triggers_present_approves():
    rec = make_record()
    result = calculate_accessorial_score(rec, PROFILES)
    assert result.routing_flag == "APPROVE"
    assert result.confidence_score == Decimal("1.00")


def test_cap_breach_beyond_tolerance_quarantines():
    rec = make_record(billed_amount=Decimal("120.00"))
    result = calculate_accessorial_score(rec, PROFILES)
    assert result.routing_flag == "REVIEW"
    assert "exceeds cap" in result.score_breakdown["cap"]


def test_unknown_code_falls_back_to_review():
    rec = make_record(accessorial_code="MYSTERY_FEE")
    result = calculate_accessorial_score(rec, PROFILES)
    assert result.routing_flag in {"REVIEW", "QUARANTINE"}


def test_scoring_is_idempotent():
    rec = make_record()
    first = calculate_accessorial_score(rec, PROFILES)
    second = calculate_accessorial_score(rec, PROFILES)
    assert first.confidence_score == second.confidence_score
    assert first.score_breakdown == second.score_breakdown

Fixture design that matters in this domain: a flat-billed line carrying a per_hour cap, a record with trigger_value=None, a negative billed_amount (must be rejected at the boundary, never scored), and a carrier whose code only resolves after taxonomy folding. The idempotency test is not optional — it is the property the whole audit trail depends on.

Performance & Tuning

Scoring is CPU-light and embarrassingly parallel because each record is independent and stateless. Throughput is governed by upstream enrichment, not by the arithmetic.

Batch size: score in batches of 500–2,000 records per worker pull. Smaller batches add queue overhead; larger batches inflate redelivery cost when a single poison record fails the boundary.
Profile caching: load accessorial_profiles.yaml once per worker into an immutable dict at startup. Re-reading the file per record is the single most common throughput regression; pin a contract_version_id so a mid-batch GitOps deploy cannot swap the matrix under an in-flight batch.
Decimal cost: Decimal is ~10x slower than float but the per-record budget (a handful of operations) keeps this negligible against I/O. Never trade it for float to chase microseconds.
Memory footprint: keep raw_source_payload out of the scoring hot path — reference it by key and rehydrate only when a record routes to dispute.

Failure Modes

Five scenarios account for nearly all production incidents in this stage. Each has a deterministic root cause and a diagnostic you can run against a captured record.

1. Unknown accessorial flood. Taxonomy folding stalled upstream, so canonical codes never arrive and every record drops to the fallback profile.

unknown = [r for r in batch if r.accessorial_code not in profiles]
logger.warning("unmatched_codes", extra={"count": len(unknown),
               "codes": sorted({r.accessorial_code for r in unknown})})

Resolution: re-run Accessorial Charge Taxonomy Mapping and re-queue; do not widen the fallback to APPROVE.

2. Unit mismatch false positives. A flat-billed line scored against a per_hour cap approves charges that are actually overbilled. Diagnose by asserting uom consistency before the cap rule and counting mismatches per carrier.

3. Silent trigger gaps. Carrier portal outage drops dock timestamps, so legitimate detention slides to QUARANTINE. Diagnose by tracking the triggers breakdown key distribution; a spike in “missing: dock_in_time” for one SCAC points at the carrier, not your rules.

4. Config drift mid-batch. A profile redeploy changes a cap while a batch is in flight, so two replays disagree. Resolution: bind the matrix snapshot to contract_version_id and read profiles from a read-only directory.

5. Float contamination. A billed_amount arrives as a JSON float and loses precision at the cap boundary. Diagnose by enforcing Decimal coercion in the Pydantic model and rejecting non-string numeric input at ingestion.

All exceptions are wrapped in structured logging payloads carrying shipment_id, carrier_scac, and error_context. Silent defaults are prohibited; schema violations go to a dead-letter queue with field-level messages, never to a guessed score.

Integration Points

The scored payload is the contract this stage owes the rest of the audit. Its three-way routing flag is consumed directly downstream:

Routing flag	Consumer	Action
`APPROVE`	Payment reconciliation	Charge passes to AP accrual unchanged
`REVIEW`	Threshold Tuning & Alerting	Aggregated into variance bands and alert thresholds
`QUARANTINE`	Dispute routing	Packaged with `score_breakdown` as dispute evidence

The score_breakdown is the load-bearing field at the boundary: it is what a dispute ticket cites and what Threshold Tuning & Alerting reads to decide whether a pattern of REVIEW verdicts warrants a tightened rule. Emit counters for records_scored, routing_distribution, and config_fallback_invocations, and track scoring latency percentiles (p50, p95) — a rising p95 here almost always signals an upstream enrichment bottleneck, not a scoring regression.

Deep-Dive Guides

Step-by-step walkthroughs that build on this stage:

Matching Shipment Lanes to Contracted Rate Tables Using Python — how the contract_id and lane keys this stage depends on are resolved.
Cross-Checking Billable Weight Against Actual Weight Logs — reconciling the weight figures that drive weight-based accessorials.
Building an Accessorial Charge Lookup Table in Postgres — persisting the canonical codes and caps the scoring matrix reads from.

Lane Matching Algorithms — resolves origin/destination to the correct accessorial tariff upstream of scoring.
Weight & Zone Cross-Validation — reconciles billable versus actual weight before weight-driven accessorials are scored.
Threshold Tuning & Alerting — turns aggregated REVIEW verdicts into alert thresholds and rule adjustments.
Accessorial Charge Taxonomy Mapping — folds carrier-specific codes into the canonical taxonomy this stage matches on.
Automated Invoice Parsing & EDI/XML Ingestion — produces the normalized records the scoring stage consumes.

Up: Rule-Based Rate Validation & Accessorial Auditing