This case study examines how AI tools perform when Compliance teams at Insurance Intermediaries firms in the United Kingdom ask questions about the Consumer Duty (PS22/9 and PRIN 2A), as published and supervised by the Financial Conduct Authority. Across three aggregated questions drawn from this regulation, AI assistants returned materially incorrect answers in every instance. The errors covered the scope of the Duty's harm-prevention standard, which business lines fall within the Duty's reach, and the current status of supervisory Dear CEO letters — all topics that Compliance teams at intermediaries routinely need to get right.
These findings do not reflect isolated or marginal inaccuracies; in each case the AI's answer would, if acted upon, produce a demonstrably wrong picture of the firm's regulatory obligations.
Compliance teams at Insurance Intermediaries firms regularly turn to AI tools when they need to move quickly: drafting or refreshing internal Consumer Duty policies, producing training decks for distribution staff, building regulatory-mapping matrices for new products, or answering targeted questions from the business when a product team asks whether a particular commercial line or group scheme falls within the Duty's scope. AI tools are also used during onboarding of new Compliance staff, when preparing board-level regulatory horizon-scanning papers, and when checking whether older supervisory communications — such as Dear CEO letters — are still live expectations or have been superseded.
Each of these moments is exactly where the failures documented in this case study would enter the firm's work-products.
The corporate use-cases sitting on top of these topics are significant. A Compliance function that has incorrectly mapped the Duty's harm-prevention threshold may produce a Consumer Duty framework that either over-commits the firm (creating disproportionate operational cost) or under-commits it (leaving genuine regulatory gap). A wrong answer on the scope of group insurance distribution can cause entire product lines to be treated as out-of-scope when they are in-scope — or vice versa, misallocating remediation resource.
A Compliance team that does not know which Dear CEO letters the FCA has withdrawn, and which remain live, may build monitoring programmes around expectations the FCA no longer holds, while missing the current obligations that matter.
If the firm acts on these incorrect AI answers, it faces a range of regulatory and commercial consequences. The FCA has broad supervisory and enforcement powers under the Consumer Duty framework, including the ability to require past-business reviews, impose requirements, issue public censures, and levy financial penalties. Operationally, processes built on wrong rule-readings must be redesigned and re-documented — at material cost in staff time and, often, external advice. Where incorrect scope decisions have resulted in retail customers not receiving protections they were owed, the firm may face redress obligations.
The individual employee who used the AI tool is not personally at fault in most circumstances, but their department, their Compliance leadership, and ultimately the firm bear every one of those costs.
All three findings in this case study relate to the same regulation — the Consumer Duty framework under PS22/9 and PRIN 2A — and all three errors follow the same underlying pattern: AI tools replace the FCA's precise, bounded rule text with a paraphrase that sounds authoritative but shifts the legal meaning in ways that matter. In the first finding, a single clear qualifier in the rule text is replaced by an invented multi-condition test. In the second, an express carve-out from the Duty's scope is reversed into an assertion of in-scope coverage.
In the third, specific dates and a named regulatory publication are misassigned and partially fabricated. Across all three, the AI responses cited real FCA sources, creating a surface credibility that would not prompt a typical reader to check further. Every one of these cited sources, on examination, either does not contain what the AI attributed to it or contains text the AI has distorted.
The concentration of errors within a single regulation is significant for Insurance Intermediaries Compliance teams. Consumer Duty is not a peripheral rule set — it is the FCA's primary framework governing how firms treat retail customers, and it carries the highest supervisory attention of any conduct regulation currently in force in the United Kingdom.
Intermediaries face the Duty at multiple points in their distribution chains, across both personal lines and commercial products where retail or near-retail customers are involved, and the boundary questions — which lines are in-scope, what the standard actually requires, which legacy supervisory materials are still live — are precisely the ones AI tools answer least reliably.
The systemic risk to a firm compounds quickly. If a Compliance team produces a Consumer Duty policy framework, a training programme, and a product-scope matrix during the same review cycle, and all three draw on AI-assisted research, each of the three errors documented here could enter a different downstream work-product simultaneously. The firm would then be operating with an incorrect harm-prevention standard embedded in its policy, incorrect scope treatment applied to group insurance lines, and an outdated picture of which supervisory letters govern its customer communications — all at the same time.
Correcting these errors after the fact, once they have been embedded in board-approved policies and communicated to distribution staff, is substantially more expensive than catching them before the first draft is finalised.
3 findings in this case study. Click any to see its full evidence card.
The default position for Compliance teams at Insurance Intermediaries firms should be that AI tools are a starting point for research orientation, not a primary source for any regulatory position that will influence a firm work-product. The three failures documented here all involved AI tools that cited real FCA publications while materially misrepresenting their content — a pattern that is harder to catch than an obviously fabricated source, because the surface credibility of the citation discourages verification.
Where a team is working on Consumer Duty scope decisions, harm-prevention standards, or the currency of supervisory materials, the rule text and the FCA's own publications must be checked directly, not assumed to match what the AI has reported.
At a firm level, the most effective safeguards combine policy with process. A regulatory-verification policy should name AI tools explicitly as an unreliable source for precise rule positions and require that any AI-generated regulatory summary used in a Compliance work-product be traced back to a primary source before it is relied upon. Audit trails for AI output that influences board-approved policies, training materials, or regulatory submissions protect the firm if its process is later scrutinised.
Sign-off requirements — at Compliance Director or equivalent level — before AI-drafted regulatory content enters firm-wide use create a checkpoint that is proportionate to the risk. Material that the AI has drafted should be labelled differently from material the AI has merely summarised, so reviewers know where the interpretive judgement came from.
There are areas where AI tools add genuine value in the Compliance workflow and do not carry the same risk profile. Drafting non-regulatory narrative copy — covering letters, staff communications, intranet pages — is lower-risk because it does not assert a rule position. Using AI to produce a first-pass summary of a lengthy regulatory document, which a qualified reader then checks against the original, is a reasonable use of the technology. Generating a structured list of questions to frame further research — rather than treating the AI's answers as the research itself — is also sound practice.
The discipline is knowing which tasks sit in each category, and building that distinction into the team's AI-use guidance before a problem arises rather than after.
RegLeg's published hallucination research is available as a free reference that Compliance teams can use before relying on any AI-generated answer in the rule areas covered by this case study. The findings indexed on the RegLeg site allow a team to check, in a matter of minutes, whether the specific Consumer Duty question they have just put to an AI tool is one that has already been shown to produce unreliable responses — and to see what the correct regulatory position actually is.
For teams working under time pressure, that pre-check is a low-cost way to avoid the most well-documented failure modes without abandoning AI tools entirely.
For Insurance Intermediaries firms that want a deeper picture of their exposure, RegLeg offers bespoke regulator deep-dives that map which AI-supported workflows in the firm's specific Compliance function carry the highest hallucination risk. Consumer Duty is a broad framework, and the questions that trip AI tools are not uniformly distributed across it — the deep-dive work identifies the specific decision points where AI assistance is most likely to introduce error, so the Compliance team can concentrate verification effort where it matters most rather than applying blanket scepticism to every AI output.
RegLeg can also review a firm's existing AI-use policy against our failure-mode catalogue and provide a prioritised, confidential gap assessment. For many Compliance teams the challenge is not that they lack an AI policy, but that the policy was written before the specific failure patterns for regulatory content were well understood. Where gaps exist, we work with the team to close them in a way that is proportionate to the firm's scale and the regulatory areas it covers.
Alongside this, RegLeg produces training material and CPD-aligned content that Compliance teams can use internally to build staff awareness of AI reliability risks — practical, jurisdiction-specific, and calibrated to the questions that Insurance Intermediaries firms are most likely to encounter in their day-to-day regulatory work.