LegalTech case study

Flight-Compensation Document Intelligence
One pass, zero bad writes

An async document-intelligence pipeline that reads, classifies, and extracts data from incoming legal mail for an EU261 flight-compensation firm, then writes structured results back to the CRM — with a safe manual-review fallback for anything uncertain.

LegalTech & Aviation Document IntelligenceOCRLLM Extraction
Industry
LegalTech & Aviation
Timeline
Engaged as a task force to finish, fix, and cut over a stalled pipeline
Outcome
One pass, zero bad writes
Result snapshot

One pass, zero bad writes

The full flow runs live end-to-end with a credential-isolated design and a proven safe fallback: well-formed documents are classified and extracted automatically, while garbage or uncertain inputs are flagged for human review instead of corrupting case data. Roughly half of incoming documents are born-digital and bypass OCR completely.

A live, auditable pipeline from inbound PDF to structured CRM update
Safe-by-design handling of untrusted documents with no bad writes
A clean async architecture that replaced a stalled, never-validated monolith
/ The challenge

Where the bottleneck actually was

An EU flight-compensation firm (claims under EU Regulation 261/2004) received a constant stream of incoming PDFs — objection notices, court-cost invoices, handover letters, booking confirmations, customer claim forms. Each one had to be read, identified, and have its key fields entered into the CRM by hand. A previously built pipeline had stalled because its OCR engine never worked reliably, so nothing flowed end to end.

Every incoming PDF had to be read, identified, and keyed into the CRM by hand.
A prior pipeline had stalled on a broken OCR engine, so nothing ran end to end.
The system handles untrusted attachments, so it could not be trusted with CRM credentials.
/ What we built

A system built around the real workflow

We were brought in as a task force to finish, fix, and cut the system over — not rebuild it. We fixed the OCR layer, then hardened a single async pass that runs OCR, a keyword pre-filter, classification, summarization, and field extraction, and emits a CRM-ready result. Born-digital PDFs skip GPU-OCR entirely by reading their text layer. The document that opens untrusted attachments holds no CRM credentials; results travel back over HMAC-signed callbacks to a workflow that is the only component allowed to write to the CRM. Anything low-confidence or unrecognized is routed to a human instead of written blindly. The system is built for safe, auditable automation — not blind throughput.

Module 01
Vision OCR with a text-layer triage path that skips OCR for born-digital PDFs
Module 02
A deterministic keyword pre-filter that primes — but never overrides — the classifier
Module 03
LLM classification and field extraction across six document types plus a safe "unknown" bucket
Module 04
A credential-isolated brain plus HMAC-signed callbacks to a single CRM-writing workflow
Module 05
A manual-review fallback that flags anything uncertain instead of writing it
Build profile
Stack
Vision OCRLLM classification & extractionFastAPI async pipelinen8n orchestrationHMAC-signed callbacks
Proof source
EU flight-compensation firm
Async document-intelligence pipeline (EU261)
Related pages
Next step

Want this outcome in your business?

We scope custom AI systems around the workflow that is already costing you time, margin, or speed.

Book a free consultation
/ More case studies