This case study examines how AI assistants perform when asked questions about Consumer Duty regulation — specifically PS22/9 and PRIN 2A — by Compliance teams at Retail Banking firms in the United Kingdom. Across nine aggregated questions put to AI tools on this single regulatory framework, AI assistants produced materially incorrect, incomplete, or misleading responses in every case. The errors are not superficial: they concern the legal basis of the Duty, the scope of the retail customer definition, the binding force of specific rules versus guidance, and the current state of FCA supervisory expectations.
For Compliance functions whose work feeds into policy documents, training materials, and regulatory returns, each of these errors represents a discrete point of institutional risk.
Compliance teams at Retail Banking firms turn to AI tools at predictable points in their day-to-day workflow: drafting or refreshing internal Consumer Duty policies, producing training materials for frontline staff and business lines, conducting regulatory mapping exercises when a new product is being scoped for launch, preparing responses to business line queries about what the Duty requires in a specific sales or servicing context, and building fair value assessment frameworks ahead of annual board sign-off. All nine of the question areas covered in this case study sit squarely inside those tasks.
When a member of the Compliance team asks an AI tool whether consumer testing is mandatory, or which entities count as retail customers, or what the FCA expects from a fair value assessment, they are asking questions with direct operational consequences for the firm.
The corporate use-cases that sit on top of these topics are significant. A Retail Banking firm's Consumer Duty implementation programme — its policies, its MI framework, its product review cycle, its complaints handling approach — rests on a shared understanding of what the rules actually say. If the underlying understanding is sourced from AI tools and those tools are consistently wrong, the programme is built on a flawed foundation.
That flaw propagates: incorrect AI output influences a policy document, the policy document shapes staff training, staff training shapes customer interactions, and customer interactions are what the FCA examines in a supervisory review.
What is at stake for the firm is substantial. The FCA has broad supervisory powers under PRIN 2A and PS22/9: it can require firms to remediate customer harm, impose requirements, publish public censures, and — where it finds systemic failures — refer matters for enforcement action. Firms that build Compliance processes on incorrect AI answers about the Consumer Duty face the compounded risk of regulatory action, remediation costs, and reputational damage.
The individual employee who used the AI tool bears no personal regulatory liability for the error; the firm, its senior managers under SMCR, and its regulated activities absorb the full consequence.
All nine findings in this case study relate to a single regulatory framework — the FCA's Consumer Duty as set out in PS22/9 and PRIN 2A. That concentration is itself significant: it means the hallucination risk is not scattered across unrelated areas of law but is clustered precisely on the regulation that Compliance teams at Retail Banking firms are most actively implementing and monitoring right now. The pattern of errors falls into three broad categories.
First, AI tools systematically misstate the legal character of requirements — conflating guidance with binding rules, or asserting that a specific rule prescribes a methodology when the actual rule leaves methodology to the firm's judgement. Second, AI tools drop or distort critical qualifying conditions in the rule text — substituting "annual income" for "annual turnover," or omitting the "reasonably believes" threshold that determines when a firm has a defence. Third, AI tools either fabricate specific factual details (dates, document references, event sequences) or decline to answer at all on questions where the FCA has published a clear and specific answer.
The shape of these errors matters as much as their frequency. A Compliance team that receives an AI answer placing consumer testing in a binding rule will build a different — and more burdensome — internal policy than one that correctly understands testing as guidance-recommended best practice. A team that understands the charity retail customer threshold as "annual income" rather than "annual turnover" may mis-scope its Consumer Duty programme from the outset. These are not edge cases; they are foundational questions that any competent Consumer Duty programme must answer correctly.
The fact that AI tools got all nine wrong — and that multiple AI tools independently produced incorrect responses on five of the nine questions — indicates that the risk is not confined to a single tool or a single configuration.
The systemic risk to a Retail Banking firm compounds quickly. Consumer Duty compliance is not a one-off exercise: firms produce annual board reports, ongoing MI, regular product reviews, and continuous training updates. Each of those cycles creates a fresh opportunity for an incorrect AI answer to enter a work-product and be acted upon.
If several downstream outputs all rest on the same incorrect AI answer about, for example, the scope of the retail customer definition or the firm's obligations around fair value quantification, the cost of correction is not limited to one document — it extends to every process, policy, and record that was built on that answer.
For a Compliance function that is also operating under the scrutiny that the FCA has signalled it will bring to Consumer Duty supervision, the cost of getting it wrong is measured in regulatory time, remediation resource, and the credibility of the function with the Board and senior management.
9 findings in this case study. Click any to see its full evidence card.
The default position for any Compliance team at a Retail Banking firm should be that AI tools are a starting point, not a primary source, for questions about Consumer Duty obligations. The findings in this case study show that AI assistants are unreliable on precisely the kinds of questions a Compliance team asks most often: what a specific rule requires, how guidance relates to binding rules, who counts as a retail customer, and what the current state of FCA supervisory expectations is. These are not obscure questions — they are the operational backbone of Consumer Duty compliance.
Treating an AI response as authoritative on any of them, without independent verification against the FCA Handbook, PS22/9, or primary FCA publications, creates a structural risk in the firm's compliance programme.
Practical firm-level safeguards should include a documented AI-use policy that explicitly identifies Consumer Duty rule interpretation as a high-risk area requiring verification. Any AI output that influences a firm work-product — a policy document, a training module, a board report section, a supplier assessment — should carry an audit trail recording that the AI was used and that the output was verified against a named primary source before use. Sign-off requirements should apply before AI-drafted regulatory content enters firm-wide circulation or regulatory-facing material.
Firms should also distinguish clearly between "AI-drafted" content (prose the AI generated) and "AI-summarised" content (a précis of source material the team has already verified), because the risk profile of each is different. Compliance leaders should communicate this distinction explicitly to business lines who may be using AI tools independently.
There are areas of the Compliance workflow where AI tools are genuinely useful and carry lower risk. Drafting non-regulatory communications, generating first-draft questions for a regulatory deep-dive, summarising long consultation papers the team will verify independently, or producing initial checklists that a qualified compliance officer will review — all of these are reasonable uses. The risk concentrates when AI output substitutes for, rather than feeds, expert human judgement on what a rule actually says. Keeping that line clear is the practical task.
RegLeg's hallucination research is published as a free resource that Compliance teams can use before relying on any AI answer in Consumer Duty and other FCA rule areas. The findings in this case study — covering nine distinct questions across PS22/9 and PRIN 2A — represent a documented record of where AI tools have produced incorrect or misleading responses on questions that Retail Banking Compliance teams encounter in day-to-day practice.
Teams can use this research as a pre-verification check: if the question they are asking an AI tool matches a topic area covered in the research, they know in advance that independent verification is not optional. The research is updated as new regulatory developments are published and as AI tools are re-tested against current sources.
Beyond the published research, RegLeg works with Compliance teams at Retail Banking firms to map which AI-supported workflows in their specific environment carry the highest hallucination exposure. For Consumer Duty in particular, the risk is not evenly distributed — it concentrates on rule-versus-guidance distinctions, defined terms, threshold values, and current supervisory expectations, all of which are areas where AI tools have demonstrated consistent difficulty. A bespoke mapping exercise identifies which of the firm's existing or proposed AI use-cases sit in high-risk territory, so the team can apply proportionate verification controls without treating every AI output as equally suspect.
RegLeg also offers confidential review of a firm's existing AI-use policy against RegLeg's failure-mode catalogue, with a prioritised remediation report identifying where current policy provides adequate protection and where gaps remain. For Compliance teams who need to build or refresh internal training on AI use in regulated contexts, RegLeg can support the development of CPD-aligned content that is specific to the Consumer Duty risk landscape — practical, evidence-based, and usable in internal training programmes without further adaptation. Conversations with RegLeg are confidential and do not require the firm to disclose its current AI tools or configurations.