This case study examines how AI tools respond to regulatory questions relevant to the Compliance function at Life Insurance firms operating in the United Kingdom. Testing focused on Consumer Duty (PS22/9 and PRIN 2A), the Financial Conduct Authority's flagship retail protection framework. Across eight aggregated questions, AI assistants produced materially incorrect or misleading answers in every case — a consistent pattern rather than isolated incidents. The errors span substantive rule content, scope of application, procedural expectations, and the FCA's most recent supervisory communications, meaning they affect multiple layers of a Compliance team's day-to-day work.
Compliance teams at Life Insurance firms turn to AI tools at predictable points in their regular workflow: drafting or refreshing internal policy frameworks aligned to Consumer Duty outcomes, building training materials for distribution and product teams, conducting regulatory mapping when launching new protection or savings products, and supporting business lines that raise regulatory interpretation questions. AI tools are also used in supplier and distributor due-diligence, particularly when assessing whether third parties in the distribution chain are meeting their own Consumer Duty obligations. Each of these activities involves asking the AI questions very close to the ones identified in this study.
The corporate use-cases that sit on top of these queries are significant. A Compliance team that asks an AI tool about the legal basis for Consumer Duty may be drafting a board-level policy statement or a regulatory submission. A team querying whether Consumer Duty applies to group insurance distribution may be scoping which parts of the firm's product range need a full review — a decision with material resourcing and cost implications. A team asking about fair value assessment methodology may be designing the firm's annual value assessment process, which feeds directly into product management and pricing decisions.
When the AI's answer is wrong, the firm bears the consequences. The FCA has broad supervisory and enforcement powers under Consumer Duty, including the ability to require remediation of retail customers, impose financial penalties, and publish censures that affect the firm's market standing. Operational harm is equally real: if internal processes, training content, or governance frameworks are built on incorrect rule interpretations, they may fail to detect breaches until the FCA identifies them externally.
The individual employee using the AI tool is not typically the one held to account — the firm, its senior managers, and the board carry the regulatory and reputational exposure.
The errors across all eight findings share a recognisable shape: AI tools either add conditions that the FCA's text does not impose, remove qualifiers that carry significant legal weight, assert that guidance has binding rule status, or provide confidently stated figures and dates that are factually wrong. No single type of error dominates — the failures are distributed across rule interpretation, scope determination, procedural requirements, and the FCA's published supervisory activity.
All eight findings relate to a single regulatory instrument (Consumer Duty, PS22/9 and PRIN 2A), which means a Compliance team relying on AI for Consumer Duty work faces inaccurate answers across virtually every dimension of that framework, not just one corner of it.
The clustering on Consumer Duty is significant in its own right. Consumer Duty is the FCA's most consequential retail conduct framework of the decade, and it is still relatively recent — firms are still embedding it, and many are mid-cycle on their first full annual assessments. This is precisely the moment when Compliance teams are most likely to lean on AI to accelerate their work.
Two of the eight findings concern the same underlying source document (FS25/2, published in March 2025), and multiple AI tools tested were either unaware of it or fabricated incorrect dates and event sequences around it — a sign that AI tools' knowledge of recent FCA supervisory output is unreliable precisely when it matters most.
The systemic risk compounds quickly. A Compliance team that uses AI to draft its Consumer Duty policy framework, scope its group insurance product review, design its fair value assessment process, and produce training materials may be building four separate work-products all resting on incorrect foundations sourced from the same AI session. If those work-products are shared with the board, used in regulatory returns, or form the basis of staff training across distribution and product teams, the cost of remediation — once the errors surface — extends far beyond the original Compliance department.
Regulatory action by the FCA, client redress obligations, and internal remediation of mis-trained staff are all foreseeable downstream consequences.
8 findings in this case study. Click any to see its full evidence card.
The default position for any Compliance team at a Life Insurance firm should be that AI tools are a starting point for orientation, not a primary source for regulatory interpretation. The findings in this study cover the FCA's most heavily used conduct framework for retail markets, and AI tools produced incorrect answers across every major dimension of it — rule text, scope, methodology, and supervisory communications. Until a regulatory answer has been verified against the FCA's published instruments and the current Handbook, it should not be used in any work-product that influences the firm's compliance posture.
At a practical level, firms should embed several safeguards into the way Compliance teams use AI. A documented regulatory-verification policy should name AI tools explicitly as an unreliable source for Consumer Duty and PRIN 2A content, requiring mandatory cross-check against primary sources (the FCA Handbook, PS22/9, and FS25/2) before any AI output enters a draft policy, training material, or governance document. Any AI output that does influence a firm work-product should be logged in an audit trail noting the question asked, the AI's response, and the verification step taken.
Work-products containing AI-sourced regulatory content should carry a sign-off requirement — ideally from a qualified legal or regulatory specialist — before they are circulated to the board, business lines, or regulators. Where AI-drafted and AI-verified content co-exist, the two should be clearly distinguished so reviewers know which parts require independent checking.
There are areas where AI tools add genuine value in Compliance workflows without creating the same risk. Drafting non-regulatory copy such as internal communications or process narratives, summarising long regulatory documents for a team that will then verify key passages independently, and generating initial question lists for further legal research are all lower-risk uses. The key distinction is whether the AI is being asked to state what the law is (high risk) or to help structure and communicate work that a human expert will validate (lower risk).
For Consumer Duty topics specifically, given the density and recency of the relevant material, treating AI output as a research aid rather than a regulatory authority is the appropriate operating model.
RegLeg's published hallucination research is available as a free reference the Compliance team can consult before relying on any AI answer in these rule areas. The research covers specific questions on Consumer Duty (PS22/9 and PRIN 2A) and records, for each question, what AI tools typically say, what the FCA's text actually says, and where the divergence lies. A team member who has seen the research before using AI on a Consumer Duty query will know in advance which topics carry the highest error rate and which AI responses warrant the most careful independent verification.
For teams that want to go further, RegLeg offers bespoke regulatory deep-dives designed around the specific workflows of a Life Insurance firm's Compliance function. These map the points in a firm's typical annual cycle — fair value assessments, Consumer Duty board reporting, product governance reviews, distribution partner oversight — against the AI failure patterns most likely to surface at each stage. The output is a prioritised risk register the Compliance team can use to concentrate verification effort where it matters most, rather than applying blanket scepticism to every AI-assisted task.
RegLeg also offers a confidential review of a firm's existing AI-use policy against its failure-mode catalogue for Consumer Duty and related FCA rules. Where a firm's current policy does not adequately address the specific risks identified in this research — incorrect scope assessments, fabricated rule citations, inversion of the FCA's stated expectations — RegLeg can provide prioritised remediation guidance.
Alongside this, RegLeg develops CPD-aligned training content that Compliance teams can use internally to build awareness of AI limitations in regulatory work, giving staff a practical framework for knowing when to trust an AI response and when to go directly to source.