Retail Banking × Risk — United Kingdom · updated 2026-05-28 · methodology v2.1

AI Hallucinations Affecting Risk at Retail Banking Firms in the United Kingdom

This case study examines how AI tools respond to regulatory questions relevant to Risk teams at Retail Banking firms in the United Kingdom. It covers one regulation — the Financial Conduct Authority's Consumer Duty (PS22/9 and PRIN 2A) — and documents one aggregated question where AI assistants produced materially incorrect answers. The finding is drawn from structured testing of AI tools against primary regulatory source material, with outputs compared directly against regulator-published text. Risk professionals who consult AI assistants on Consumer Duty obligations should treat this case study as a factual record of failure patterns, not a theoretical caution.

When this affects Retail Banking × Risk — United Kingdom

Risk teams at Retail Banking firms routinely engage with Consumer Duty obligations across a wide range of internal processes. These include drafting or refreshing the firm's Consumer Duty implementation framework, producing risk assessments for new and existing product lines, developing training materials for frontline and second-line colleagues, and providing regulatory mapping support when business lines are designing new customer journeys or pricing changes.

It is entirely normal — and increasingly common — for a Risk analyst or manager to turn to an AI tool to get a quick orientation on a rule, check the wording of a specific obligation, or generate a first draft of a policy section. The Consumer Duty's harm-prevention provisions are among the most frequently referenced, because they sit at the centre of product, pricing, and complaints governance.

The corporate use-cases that depend on accurate answers here are substantial. A Risk team's interpretation of when a firm's harm-prevention obligation is discharged — and what role customer consent or risk-acceptance plays — feeds directly into product approval processes, customer vulnerability frameworks, complaints-handling standards, and board-level Consumer Duty attestations.

If an AI tool produces a subtly wrong answer about the conditions under which a firm is relieved of liability when a customer knowingly accepts a risk, that answer may be incorporated into a policy document, a training deck, a supplier due-diligence questionnaire, or a regulatory mapping exercise before anyone checks the source rule.

The firm — not the individual employee who used the AI — bears the consequences when that error reaches a regulator. The Financial Conduct Authority has broad supervisory and enforcement powers under the Consumer Duty: it can require remediation, impose financial penalties, issue public censures, and compel changes to business models. A Risk team that builds its internal framework on an AI-generated misstatement of the Duty's harm-prevention rules is, in practical terms, building a compliance position on a flawed foundation.

The reputational and financial costs of discovering that error after a regulatory review, a complaint, or an enforcement action are considerably higher than the cost of verifying the source rule before the work-product is finalised.

Aggregate impact

The finding documented here illustrates a specific and consequential type of AI error: the substitution of a simple, clear regulatory threshold with a more elaborate multi-condition test that does not appear in the source rule. The FCA's actual text sets a single qualifier — the firm's reasonable belief that the customer understands and accepts the risk. AI tools tested on this topic replaced that single qualifier with a cluster of additional conditions (acting in good faith, having supported understanding, having avoided harm caused by the firm's own conduct, and otherwise complying with the Duty) that the regulation does not require.

The resulting answer sounds authoritative and internally consistent, but it describes a different legal standard from the one the FCA actually published.

This error type — adding plausible-sounding but fabricated conditions to a regulatory test — is particularly dangerous in a Risk context because it inflates the firm's apparent obligations rather than understating them. A Risk team that adopts the AI's version of the rule may believe it is being conservative when it is in fact applying an invented framework. The error is also hard to detect in review: the AI's version is more detailed than the real rule, not less, so it can appear to reflect a thorough reading rather than a distortion of the source text.

For a Retail Banking firm whose Risk function covers multiple product lines, business units, and regulatory change programmes simultaneously, even a single misconstrued rule can propagate across many work-products in a short time. A Consumer Duty harm-prevention standard that is wrongly stated in the firm's central policy framework will recur in every downstream document that references it — training materials, product governance assessments, management information templates, and board reports. Correcting that error once it has been embedded is resource-intensive and may require formal regulatory notification if the mismatch has influenced a reportable assessment or a customer-facing process.

Findings

1 finding in this case study. Click any to see its full evidence card.

Foreseeable harm and customer risk acceptance under the Consumer Duty see this finding →

What your team should do

The default position for Risk teams at Retail Banking firms should be that AI tools are a starting point, not a reliable primary source, for any question about specific regulatory obligations under the Consumer Duty or related FCA rules. This is not a wholesale rejection of AI in Risk workflows — it is a targeted policy position for a specific category of question. Where AI tools are used to orient a team member on a topic, generate a first draft of a non-regulatory section, or produce a list of questions for further investigation, they can add real efficiency.

The point at which they become a liability is when their output is treated as an accurate statement of what the regulator has actually said.

Practical firm-level safeguards should address the gap between AI output and verified regulatory text. A regulatory-verification policy should explicitly name AI-generated summaries of FCA rules as requiring primary-source confirmation before use in any firm work-product. Any AI output that influences a policy document, training material, product governance assessment, or regulatory mapping exercise should be logged as AI-assisted, with a corresponding record of who verified the source rule and when. Sign-off requirements should sit with a suitably experienced person — typically a senior Risk manager or Legal colleague — before AI-assisted regulatory content is circulated firm-wide or included in board-level material.

Where AI-drafted sections appear in regulatory-facing documents, they should be clearly distinguished from content that has been independently verified against primary sources.

AI tools remain genuinely useful within the Risk workflow for tasks that do not depend on the precision of a regulatory threshold. Drafting introductory copy for internal guidance notes, summarising long regulatory consultation papers that the team will then read in full, generating structured question lists for regulatory mapping workshops, and producing first-draft scenarios for training exercises are all areas where the efficiency gain is real and the risk of error is manageable.

The discipline is to know which category a task falls into before reaching for an AI tool — and to build that distinction into the team's standard operating procedures rather than leaving it to individual judgement.

How RLB can help

RegLeg publishes its Hallucination Research findings openly, and Risk teams at Retail Banking firms can use this material as a free pre-check before relying on any AI-generated answer in the Consumer Duty and wider FCA rule space. If a team member has used an AI tool to inform a policy position or draft a regulatory summary, checking that question against RegLeg's published findings takes a matter of minutes and provides an independent signal about whether that topic area is one where AI tools are known to produce incorrect output.

This is a practical complement to the firm's own verification processes, not a replacement for them.

For firms that want to go further, RegLeg offers bespoke regulator deep-dives tailored to the specific AI-supported workflows in use at a Retail Banking firm. These map the points in a typical Risk team's operating cycle — product approval, policy refresh, regulatory change management, Consumer Duty attestation — against RegLeg's failure-mode catalogue, so that leadership can direct verification effort to the highest-exposure touchpoints rather than applying blanket review to every AI output. The output is a prioritised risk register of AI-use scenarios, specific to the firm's workflow and the regulators it faces.

RegLeg also offers confidential review of a firm's existing AI-use policy against the hallucination failure modes documented in its research, with prioritised remediation recommendations. For Risk teams whose firms have adopted AI tools more broadly, this review can identify gaps between current policy and the specific categories of error that testing has shown to recur. Alongside this, RegLeg produces training material and CPD-aligned content that Risk teams can deploy internally — helping colleagues at all levels understand where AI tools add value in a regulatory context and where independent verification is non-negotiable.

These materials are designed to be adapted to a firm's existing training infrastructure rather than delivered as off-the-shelf modules.

← Back to summary Other sector case studies in United Kingdom →