AI Hallucination ResearchAudiencesSectorsUnited KingdomGeneral InsuranceCompliance › Detail
General Insurance × Compliance — United Kingdom · updated 2026-05-26 · methodology v2.1

AI Hallucinations Affecting Compliance at General Insurance Firms in the United Kingdom

This case study examines how AI tools perform when Compliance teams at General Insurance firms in the United Kingdom query them about obligations under the Financial Conduct Authority's Consumer Duty framework (PS22/9 and PRIN 2A). Across four aggregated questions covering scope of application, the harm-avoidance standard, and the current supervisory landscape, AI assistants returned materially inaccurate answers. The errors ranged from importing additional conditions not present in the regulatory text to directly contradicting explicit exclusions, and from fabricating event dates to failing to identify a publicly available FCA document that answers the question precisely.

Every finding maps to a Consumer Duty question that a General Insurance Compliance team would routinely encounter — making these not edge-case curiosities but live risks in everyday regulatory work.

When this affects General Insurance × Compliance — United Kingdom

General Insurance Compliance teams consult AI tools in a wide range of day-to-day tasks: drafting internal policies and procedure notes, producing training materials for underwriters and claims handlers, mapping regulatory obligations onto new product propositions, preparing briefing packs for business lines facing a specific regulatory question, and scoping Consumer Duty assessments for distribution arrangements. Each of these workflows touches the Consumer Duty questions captured in this research — what the harm-avoidance standard actually requires, which business lines and products fall inside or outside scope, and which FCA supervisory letters still carry live expectations.

When an AI tool is used to answer any of these questions, the output feeds directly into firm work-products that may be reviewed by the FCA, shared with boards and committees, or relied upon by first-line colleagues making product and commercial decisions.

The commercial stakes are significant. A General Insurance firm that misunderstands the scope exclusion for group insurance may build an unnecessary Consumer Duty programme around distribution arrangements the FCA has explicitly carved out — wasting compliance resource and creating internal confusion about which obligations apply. Conversely, a firm that relies on an AI tool's incorrect assertion that group insurance distribution is in scope may also accept a false sense of comfort where the rules do actually apply, depending on how the answer is framed. Either way, the firm's Consumer Duty mapping documentation and governance framework will carry the error forward.

Similarly, if Compliance builds an internal briefing on the harm-avoidance standard using an AI-generated formulation that imposes three additional conditions not found in the rule text, the firm's internal standard will diverge from the FCA's actual requirement — with the gap only surfacing during a supervisory engagement.

The firm bears the regulatory consequences when AI-sourced misinformation enters its compliance processes. Regulatory action by the FCA under Consumer Duty can include requirements to remediate affected customers, public censures, financial penalties, and — where systemic failures in governance are identified — senior manager accountability under SMCR. The individual employee who consulted an AI tool is not typically the subject of enforcement action, but the department that produced the flawed work-product, the senior manager who approved it, and the firm as a whole absorb the operational, financial, and reputational cost.

Aggregate impact

All four findings in this research relate to a single regulatory framework — the FCA's Consumer Duty (PS22/9 and PRIN 2A) — and the errors cluster around two distinct failure modes. The first is condition-substitution: AI tools reformulate a rule by replacing a precise legal qualifier (such as "reasonably believes") with a looser or more elaborate paraphrase, or by inserting additional requirements that simply are not there. The second is temporal fabrication and document blindness: AI tools either invent specific dates for regulatory events or disclaim knowledge of documents that are publicly available and directly responsive to the question asked.

Both failure modes produce outputs that look authoritative and are not obviously wrong to a reader who has not independently checked the primary source.

The findings are concentrated at an area of Consumer Duty that General Insurance Compliance teams encounter frequently: scope, the harm-avoidance standard, and the current state of the FCA's supervisory expectations. Errors here are not isolated to one corner of the Consumer Duty framework — they span the foundational questions a Compliance team must get right when building or maintaining its programme. A team that uses AI tools to answer questions about which arrangements are in scope, what the firm's duty of care requires, and which FCA letters still set live expectations will encounter AI-generated errors at each of those stages.

The systemic risk compounds quickly. If a General Insurance firm's Consumer Duty gap analysis, its internal training materials, its product governance framework, and its board-level Consumer Duty assessment are each informed at any point by AI-generated answers to these questions, a single AI error at the definition stage propagates across all four work-products simultaneously. When those work-products are reviewed by the FCA — through a supervisory visit, a data request, or a section 166 review — the regulator will measure the firm against the actual text of PS22/9, not against the AI's version of it.

The gap between the two is where regulatory exposure lives.

Findings

4 findings in this case study. Click any to see its full evidence card.

  1. Harm-avoidance standard: the "reasonably believes" qualifier see this finding →
  2. Withdrawal of pre-Consumer Duty Dear CEO letters: count and timing see this finding →
  3. Consumer Duty scope: reinsurance, group insurance, and large-risk commercial contracts see this finding →
  4. Live supervisory expectations: which Dear CEO letters remain in force see this finding →

What your team should do

The default position for Compliance teams at General Insurance firms should be that AI tools are a starting point for research, not a primary source for regulatory questions. For Consumer Duty specifically — scope determinations, the harm-avoidance standard, the current state of Dear CEO letters — the findings in this research show that AI-generated answers can be plausible in tone while being materially wrong in substance. That means the risk is not obvious at the point of use: an AI response that adds conditions to a rule or inverts a scope exclusion will not flag itself as incorrect.

The only reliable control is independent verification against the FCA's published text before any AI output influences a firm work-product.

At the firm level, practical safeguards include: a regulatory-verification policy that names AI tools as an unreliable primary source for Consumer Duty rule interpretation and requires team members to cite the underlying FCA document rather than the AI's summary of it; an audit trail for any AI output that informs a Compliance deliverable, so that the verification step is documented and reviewable; sign-off requirements before AI-drafted regulatory analysis enters board papers, governance frameworks, or external communications; and a clear labelling distinction in internal documents between content that has been AI-drafted and content that has been reviewed against primary sources.

For questions about which Dear CEO letters remain live, teams should consult the FCA's own published updates — FS25/2 is the relevant document as of March 2025 — rather than relying on AI tools that have demonstrated an inability to locate or correctly characterise it.

There are areas of the Compliance workflow where AI tools carry lower risk and can add genuine value: drafting non-regulatory copy such as staff communications or internal process descriptions, summarising long documents that the team will independently verify before use, and generating initial question lists or topic maps for further primary-source research. The distinction that matters is between AI as a drafting assistant for content the team controls, and AI as an oracle for regulatory interpretation. In the latter role — particularly on Consumer Duty questions — these findings indicate that AI tools should not be trusted without verification.

How RLB can help

RegLeg publishes its hallucination research findings as a free reference resource that Compliance teams can consult before relying on an AI-generated answer in any of the rule areas covered. For Consumer Duty questions — scope, the harm-avoidance standard, supervisory letter status — teams can use the published findings to cross-check whether a given question sits in a known high-error zone before the AI output is incorporated into firm work. This does not replace primary-source verification, but it provides a practical signal about where AI tools have already been shown to fail, and at what severity.

For General Insurance firms that want to understand their specific exposure more precisely, RegLeg offers bespoke regulatory deep-dives that map which AI-supported workflows in that firm's Compliance function carry the highest hallucination risk. Not every workflow is equally exposed: a firm whose Consumer Duty programme relies heavily on AI-assisted policy drafting faces different risk points than one using AI primarily for training content. A targeted mapping exercise identifies where the firm's existing use of AI tools intersects with the question areas where errors are most likely, so that verification controls can be prioritised and resource allocated proportionately.

For firms that already have an AI-use policy in place, RegLeg can provide a confidential review of that policy against our failure-mode catalogue — identifying gaps in coverage, areas where the policy's controls are not calibrated to the types of error AI tools actually produce on regulatory content, and a prioritised remediation list. We also develop CPD-aligned training content that Compliance teams can deploy internally, helping first-line colleagues understand what kinds of AI output require verification and why, without requiring them to engage with the technical detail of how AI tools work.

The aim throughout is to give General Insurance Compliance teams practical tools for working alongside AI responsibly — not to discourage use, but to ensure that use is proportionate to what AI tools can reliably deliver.

← Back to summary Other sector case studies in United Kingdom →