Insurance

How a Leading Insurer Increased Straight-Through Claim Processing by 32% by Turning Human Expertise into Continuously Improving AI

Consumer health customer case study healthcare illustration 2
A global insurer set out to modernize their claims operations using AI agents—aiming to accelerate processing, reduce manual adjudication, and improve decision consistency at scale. While early results in controlled environments were promising, scaling automation in production proved far more complex. 

The Challenge

The insurer deployed AI agents across their claims workflow—from intake and validation to fraud detection and adjudication. In staging, the agents performed well. In production, however, a different pattern emerged. Adjudicators were frequently stepping in—especially for high-value claims and fraud-flagged cases. Many of these interventions seemed to follow recognizable patterns: similar claim types, recurring provider scenarios, and known edge cases, but that seemed only anecdotal evidence. The organization lacked visibility into what was actually happening. 

Why were agents
escalating these cases?

Why were adjudicators
overriding decisions? 

Why were these
interventions so frequent? 

At the same time, improving the agents proved slow and difficult. Each issue required manual investigation, ad hoc testing, and lengthy validation cycles before fixes could be deployed. Without structured evaluation datasets grounded in real-world scenarios, iteration cycles stretched into weeks. 

The result: automation plateaued, manual effort remained high, and the expected ROI from AI remained out of reach. 

The Root Cause: Agents Missing Operational Judgment

The gap wasn’t in the models underlying the agents—it was in the operational knowledge they lacked. Real-world claims processing depended on context that lived beyond structured systems: historical provider behaviour, prior claim patterns, adjudicator judgment, and information captured in unstructured formats like email and documents.

None of this context was visible to the agents. More critically, it wasn’t visible to the organization either. Scout revealed the scale of the problem: 

41%

of observed claim adjudications required human intervention, with a notable portion involving unplanned adjudicator review

32

recurring workflow variants accounted for most end-to-end claim paths in production  

18

override clusters explained many  repeated adjudicator interventions, often shaped by combinations of claim type, customer type, and claim value band

Agent telemetry and human actions existed in separate silos. There was no unified view of how a claim moved across agents and adjudicators—nor any way to understand the reasoning behind decisions. Human overrides were treated as exceptions, not as signals. Recurring patterns were not systematically captured. And every improvement cycle started from scratch.

Without a way to capture real-world scenarios as structured evaluation data, the organization struggled to validate fixes with confidence. Iteration remained slow, and agents could not systematically learn from production.

How Scout Delivered a Solution

The insurer implemented Scout to create a unified, reasoning-aware view of agentic workflows—and to turn that understanding into continuous improvement. 

Unified Human + Agent Observability

Within 2 weeks of deployment, Scout began to surface both agent decisions and human adjudication steps in a single, end-to-end view of each claim it observed. Every workflow—across intake, validation, fraud checks, and adjudication—was stitched together into a complete execution trace.

High-Fidelity Business Context

Low-level interactions were translated into business-relevant actions, connected to agent telemetry: claim validation steps, adjudication decisions, and fraud assessments. This allowed the organization to understand not just where work happened, but what work was being done.

Reasoning-Aware Insights

Scout analysed why agents escalated claims and why adjudicators intervened and/or overrode. It identified recurring patterns—distinguishing expected reviews from unexpected interventions—and surfaced the underlying context driving decisions.

Eval Set Generation from Production Workflows

Scout automatically generated evaluation datasets from real claimzexecutions. Each dataset captured the full context of a case—agent inputs, decisions, and human-corrected outcomes as ground truth. This enabled teams to systematically test improvements, increase coverage of production scenarios handled, and move from anecdotal debugging to data-driven iteration.

Closed-Loop Improvement

Insights were directly converted into action—refining prompts, updating guardrails, improving adjudication logic, and enriching evaluation datasets. The feedback loop between operations and AI shifted from weeks to days.

Key Insights Uncovered 

18 override clusters drove over 60% of repeated manual adjudications, concentrated around specific combinations of claim type, customer segment, and claim value band.

Trusted-provider and repeat-customer scenarios accounted for a significant share of fraud overrides, where adjudicators applied contextual judgment unavailable to agents.

Over a third of intervention cases required context outside core systems, including prior claim history, notes, and exception handling patterns. 

A small subset of workflow variants drove a disproportionate share of delays and rework, creating a clear starting point for agent improvement.

The Impact 

With unified visibility and structured learning in place, the insurer transformed how its claims operations evolved.  
13 1

Recurring manual decisions were systematically identified and automated

Vector

Adjudicators spent less time on repetitive cases and more on truly complex scenarios

Layer 1

Agent behaviour became transparent, explainable, and continuously improvable

Most importantly, agent iteration cycles accelerated. Improvements that previously took weeks to validate and deploy could now be tested against real-world scenarios and rolled out with confidence in a matter of days. The relationship between operations and AI fundamentally shifted— from reactive intervention to continuous co-development. 

Results Delivered

32%

Increase in straight-through claim processing (STP) 

28%

Reduction in manual adjudication effort 

3x

faster agent iteration cycles from issue to validated deployment 

Summary 

By unifying human and agent workflows—and grounding every decision in real-world context—the insurer moved beyond static automation to a continuously improving system. Human expertise was no longer hidden in overrides and workarounds. It became structured, observable, and reusable at scale. 

The result was not just better-performing agents, but a fundamentally different operating model—where every claim processed made the system smarter, and every human decision contributed to the next version of AI. In claims processing, the organization didn’t just automate work. It built a system that learns from it. 

See Scout in action.
Schedule your demo now!

Request demo