AI & ML April 2, 2025 · 14 min read

Building an Agentic Compliance Platform with LangGraph and PostgreSQL Checkpointing

A technical deep dive into the architecture of an agentic AI compliance case processing platform we built for a European RegTech company. We cover the LangGraph supervisor pattern, PostgreSQL-based checkpointing for long-running workflows, and the MCP gateway for cross-system tool access.

Danylo Dudok
Danylo Dudok
Principal Architect, Sparkvern
LangGraphAgentic AIRegTechPostgreSQLPython

Regulatory compliance case processing is one of those domains that looks simple from the outside and reveals extraordinary complexity once you start building for it. A filing arrives. You validate it. You report it. What could be hard about that?

In practice, a single compliance case can involve multiple rounds of document collection, cross-referencing against applicable regulatory frameworks, coordination between compliance reviewers and external parties, partial remediation actions, gap analysis against evolving regulatory standards, and reporting obligations that vary by jurisdiction. A moderately complex case might take weeks to resolve, involving dozens of decision points and interactions with multiple external systems. Automating this is not a matter of running a single inference call — it requires a system that can reason, delegate, wait, resume, and maintain state across extended time horizons.

This is why we turned to agentic AI when a European regulatory technology company asked us to modernize their compliance case processing. This post details the architecture we built using LangGraph, PostgreSQL checkpointing, and a Model Context Protocol gateway for cross-system tool access.

Why Agentic AI for Compliance Cases

Before diving into architecture, it is worth explaining why traditional automation approaches fall short for compliance case processing.

Rule-based automation (BPMN engines, decision tables) handles simple cases well — a straightforward periodic filing where the regulatory framework is well-defined and the data requirements are clear. But it falls apart for complex cases that require judgment, ambiguity resolution, and adaptive workflows. When a cross-border compliance case involves both securities regulation and data protection obligations, the correct processing path depends on information that is only available after initial assessment, and may change as new information emerges.

Simple ML classification (categorize the case, route it to the right queue) helps with triage but does not address the processing workflow itself. You still need humans to execute each step.

Agentic AI offers a middle path: autonomous agents that can reason about the case state, decide what action to take next, use tools to gather information and execute tasks, and escalate to humans when confidence is low or policy requires it. The agents handle the routine complexity autonomously, while humans focus on the genuinely difficult cases.

The LangGraph Supervisor Pattern

We structured the platform around LangGraph’s supervisor pattern, where a single orchestrator agent directs a team of specialized agents, each responsible for a distinct aspect of compliance case processing.

The supervisor agent is the entry point for every case. It receives the initial filing data (parsed from incoming documents, which we discuss below), assesses the case type and complexity, and decides which specialist agents to invoke and in what order. Critically, the supervisor does not attempt to process the case itself. Its role is strictly coordination: understanding the current state of a case, determining what needs to happen next, and dispatching to the appropriate specialist.

The supervisor maintains a case state object that tracks everything relevant to the case’s processing: the current status, which agents have been invoked and their outputs, pending actions, escalation flags, and the overall decision trajectory. This state object is the single source of truth, and every agent interaction reads from and writes to it.

We chose the supervisor pattern over a fully decentralized multi-agent approach for a specific reason: auditability. In regulatory compliance, you must be able to explain why every decision was made. A supervisor pattern produces a clear, linear decision trail: “The supervisor assessed the case, dispatched to the classification agent, received the classification, dispatched to the validation agent for regulatory framework verification, received confirmation, and then dispatched to the remediation agent.” This trail is far easier to audit than the emergent behavior of peer-to-peer agent communication.

The Five Specialist Agents

Each specialist agent is a focused LangGraph subgraph with its own tools, prompts, and Pydantic data contracts.

Classification Agent: Categorizes incoming filings, extracts structured data, and determines the applicable regulatory framework. This agent has access to historical case data (for comparable filing classifications), regulatory taxonomy databases, and document classification models. For complex filings, it can request and analyze supporting documentation, regulatory guidance documents, or jurisdictional reference materials. Its output is a structured case classification with extracted data fields, applicable framework identifiers, confidence levels, and flags for items that require human expert review.

Validation Agent: Checks filings against regulatory schemas and business rules. This agent manages regulatory reference lookup and cross-jurisdiction coordination: identifying applicable rules across relevant jurisdictions, verifying data completeness against regulatory requirements, and flagging inconsistencies. It interfaces with regulatory body APIs and reference databases through the MCP gateway and maintains a running record of all validation outcomes.

Assessment Agent: Evaluates compliance status and risk levels with structured scores. It calculates risk assessments based on regulatory thresholds and historical patterns, assigns severity ratings across multiple compliance dimensions where applicable, and produces a consolidated risk profile. The assessment agent never issues final compliance determinations directly (that would be imprudent for an AI system). Instead, it prepares risk assessments that are queued for human approval above configurable thresholds and auto-approved below them.

Remediation Agent: Identifies compliance gaps and generates corrective action plans. When the assessment reveals deficiencies — a missing disclosure, an incomplete data submission, a filing that does not meet jurisdictional requirements — the remediation agent identifies the specific gaps, generates a corrective action plan with prioritized steps, and tracks the remediation process. This is the most long-running agent, as remediation proceedings can span months.

Reporting Agent: The most frequently invoked agent. It produces audit-ready documentation and regulatory submissions, verifies output format compliance, checks against submission deadlines and regulatory calendars, and determines the case’s readiness for final submission. This agent has access to the full regulatory framework corpus and uses retrieval-augmented generation to ground its reporting determinations in specific regulatory provisions. Every reporting decision includes a citation to the relevant regulatory section.

Pydantic Contracts Between Agents

A major source of bugs in multi-agent systems is the interface between agents. When Agent A passes unstructured text to Agent B, and Agent B misinterprets a field, the error can propagate silently through the entire workflow.

We addressed this by defining strict Pydantic data contracts for every inter-agent communication. Each agent’s output is a typed Pydantic model with validation rules. The classification agent’s output, for example, is a CaseClassification model with fields for the regulatory framework identifier (a string matching a controlled vocabulary), a list of ExtractedField objects (each with field name, value, source reference, and confidence score), a list of flags (enumerated values like REQUIRES_EXPERT_REVIEW or MULTI_JURISDICTION), and a supporting evidence list linking to documents in the case file.

When the supervisor passes this output to the validation agent, the validation agent’s input contract expects exactly this schema. If the classification agent produces output that does not conform — a missing field, an invalid confidence score, a malformed document reference — the Pydantic validation catches it immediately, before the validation agent begins processing.

This contract-based approach adds development overhead (you must define and maintain the models), but the payoff is substantial. In six months of production operation, we have had zero instances of silent data corruption between agents. Every interface failure is caught at the boundary and produces a clear, actionable error message.

PostgreSQL Checkpointing for Long-Running Cases

This is arguably the most critical architectural decision in the entire platform. Compliance cases are not request-response transactions. They are long-running processes that can span days, weeks, or months. The system must be able to pause a case’s processing (waiting for a document, an external response, or a human review), resume it exactly where it left off, survive system failures without losing progress, and maintain a complete audit trail of every state transition.

LangGraph supports checkpointing natively, and we chose PostgreSQL as the checkpoint backend rather than the simpler in-memory or SQLite options. PostgreSQL gives us durability, concurrent access, and query capabilities that are essential for a production system.

Every time the supervisor makes a decision, dispatches to an agent, or receives an agent’s output, the entire case state is checkpointed to PostgreSQL. The checkpoint includes the full LangGraph graph state (node positions, edge conditions, accumulated messages), the case state object (all business data), a timestamp, and a monotonically increasing sequence number.

This design enables several capabilities. First, resumption after failure. If the platform crashes mid-processing, every case resumes from its last checkpoint when the system restarts. No case data is lost. Given that the platform processes thousands of cases daily, this guarantee is non-negotiable.

Second, human-in-the-loop workflows. When the assessment agent flags a case for human expert review, the case’s processing pauses at a checkpoint. The human reviewer accesses the case through a web interface, reviews the agent’s assessment, provides their determination, and the case resumes from the checkpoint with the human input incorporated into the state.

Third, audit trail. The sequence of checkpoints for a case constitutes a complete, immutable record of every decision and state transition. Financial regulators can review not just the final outcome but the entire processing trajectory. This has proven valuable during compliance audits.

Fourth, debugging and replay. When a case produces an unexpected outcome, we can load any historical checkpoint, inspect the state at that point, and understand exactly why the supervisor made the decision it did. This has been transformative for improving agent prompts and tool configurations.

The PostgreSQL schema for checkpoints includes tables for the thread (corresponding to a single case), the checkpoint data itself (serialized graph state and case state), and a metadata table linking checkpoints to the triggering agent and action. We index on thread ID and sequence number for fast retrieval of the latest checkpoint or the full checkpoint history for a case.

The MCP Gateway

The specialist agents need to interact with multiple external systems: the regulatory framework management system, the case database, the reporting system, external regulatory body APIs, document storage, and audit logging services. Rather than giving each agent direct access to these systems, we built a Model Context Protocol gateway that provides a unified tool interface.

The MCP gateway exposes tools organized by domain. Document tools handle retrieval, upload, and OCR for case documents stored in the document management system. Regulatory tools handle framework lookup, rule checking, and jurisdictional reference retrieval. Reporting tools handle submission preparation, format validation, and deadline tracking. Coordination tools handle external auditor communication, regulatory body queries, and legal counsel integration.

Each tool is defined with a clear input schema, output schema, and error contract. The MCP gateway handles authentication, rate limiting, error translation, and response normalization. When the reporting agent calls the framework_check tool, it does not need to know that the regulatory reference system uses SOAP with WS-Security, or that the connection requires a mutual TLS certificate. The MCP gateway abstracts all of that.

This gateway pattern also enables a critical security feature: tool-level access control. The classification agent can access document tools and regulatory tools but not reporting tools. The reporting agent can access reporting tools but not coordination management. The supervisor can access all tools. These permissions are configured in the gateway, not in the agents themselves, making it straightforward to audit and modify the access matrix.

Schema-Validated Document Ingestion

Compliance filings arrive as documents, and these documents come in many formats. Regulatory bodies define standardized schemas for many filing types, but real-world adherence to these schemas is, diplomatically, inconsistent.

We built an ingestion pipeline that validates incoming documents against their expected schemas before they enter the agentic processing pipeline. Documents that pass validation are parsed into structured case objects and dispatched to the supervisor agent. Documents that fail validation go to a quarantine queue with detailed error reports indicating which fields failed validation and why.

The quarantine queue feeds a remediation interface where operations staff can view the validation errors, correct the document (often the issues are minor — a date in the wrong format, a missing required field, or a value exceeding its maximum length), and resubmit. This pattern mirrors the data quality quarantine approach we use in data engineering contexts, and it ensures that the agentic system never processes malformed input.

For unstructured documents (PDFs, scanned images, emails), the ingestion pipeline uses a combination of OCR and LLM-based extraction to produce structured case data. These extracted cases carry a lower confidence score and are flagged for human verification before autonomous processing begins.

End-to-End Testing Strategy

Testing an agentic system is fundamentally different from testing a traditional application. The agents are non-deterministic (LLM outputs vary), the workflows are long-running, and the interactions between agents create emergent behaviors that are difficult to predict.

We developed a three-layer testing strategy.

Unit tests validate individual agent behaviors using mocked tool responses. For each agent, we maintain a suite of test cases with fixed input states and expected output schemas. The LLM calls are configured with temperature zero and seed values for reproducibility (though this is still approximate, not exact). These tests run on every pull request.

Integration tests validate agent-to-agent communication using the actual LangGraph runtime but with mocked external systems. These tests verify that Pydantic contracts are honored, that the supervisor’s routing logic handles edge cases (what happens when the classification agent returns a flag that requires re-routing to the validation agent before proceeding to assessment?), and that checkpoint/resume works correctly.

End-to-end tests validate complete case processing workflows against a staging environment with realistic (anonymized) filing data. We maintain a library of fifty representative cases spanning all filing types and complexity levels. These tests run nightly and measure both correctness (does the case reach the expected outcome?) and performance (does it complete within the expected timeframe?).

The most valuable testing investment was the representative case library. Building and maintaining fifty well-characterized test cases, with documented expected outcomes and decision trajectories, gave us confidence in every deployment and a regression safety net that caught issues before they reached production.

Multi-Region AWS Infrastructure

The platform runs on AWS across two EU regions (Frankfurt and Ireland) for redundancy and data residency compliance. The Databricks workspaces, PostgreSQL databases, and MCP gateway components are deployed in both regions using Terragrunt for infrastructure management.

The multi-region design is active-passive: one region handles live processing while the other maintains warm standby replicas. PostgreSQL checkpoints are replicated asynchronously across regions, and the failover procedure — documented in a runbook and tested quarterly — can activate the standby region within minutes.

Databricks handles the ML model serving and any batch processing components. The agentic platform itself runs on ECS Fargate containers, which gives us fine-grained scaling control — we can scale the number of concurrent case processing threads independently of the Databricks compute.

Lessons Learned

Checkpointing is not optional for production agentic systems. Any agentic workflow that can take more than a few minutes must have durable state management. In-memory state is a prototype; PostgreSQL checkpointing is production.

Pydantic contracts between agents are worth every line of boilerplate. The upfront cost of defining typed interfaces pays back immediately in debugging speed and operational reliability. Without contracts, you spend hours tracing data corruption through agent chains. With contracts, you get an immediate, descriptive error at the boundary.

The supervisor pattern is the right default for regulated domains. Fully autonomous peer-to-peer agent architectures are exciting in research papers, but in domains where you must explain every decision to a regulator, the supervisor’s linear decision trail is invaluable.

Human-in-the-loop is not a fallback; it is a feature. The system is designed from the ground up to pause for human input at defined points. Treating human involvement as an exception rather than a design element leads to brittle systems that either over-automate (making poor decisions without human oversight) or under-automate (escalating too frequently and losing the efficiency benefits).

MCP gateways simplify agent tool management enormously. Without the gateway, each agent needs its own connection configuration, error handling, and retry logic for every external system. The gateway centralizes all of that, and its tool-level access control is essential for security in a financial services context.

Test with representative cases, not synthetic ones. Synthetic test data exercises the happy path. Real compliance filings, with their ambiguities, missing information, and edge cases, expose the behaviors that matter in production. Our fifty-case test library was one of the highest-leverage investments in the project.

Related Case Study

European RegTech Platform: Multi-Agent Compliance Orchestration Platform →

Automated routing and processing of regulatory compliance cases, manual review reduced to exception cases, complete audit trails for financial regulators

Ready to Build Your Data Platform?

Let's discuss how proven architecture and engineering can solve your specific challenges.

Schedule a Consultation