This case study examines how AI tools respond to regulatory questions relevant to Legal teams operating within Financial Advisory firms in the United Kingdom. The testing covered the Financial Conduct Authority's Consumer Duty framework (PS22/9 and PRIN 2A), one of the most significant retail conduct regimes introduced in recent years. Across two aggregated questions put to AI assistants on this regulation, errors were observed in both instances — a 100% error rate on the questions tested. The errors ranged from substituting legally distinct concepts for defined regulatory terms, to presenting unverifiable reconstructions of draft-versus-final rule differences as documented fact.
Legal teams relying on AI tools for Consumer Duty work without systematic verification face a meaningful risk of building internal processes, policies, or advice on inaccurate regulatory foundations.
Legal teams at Financial Advisory firms routinely turn to AI tools when scoping the boundaries of a regulatory obligation — for example, when determining which clients or counterparties fall within the Consumer Duty's reach, or when mapping how a regulatory framework evolved between consultation and final rules. These questions arise in practice when the team is drafting or updating internal compliance policies, preparing training materials for front-office staff, reviewing product governance frameworks ahead of a new product launch, or supporting a business line that has received a client complaint or regulatory query.
The apparent speed and fluency of AI assistants makes them an attractive first port of call for exactly this kind of threshold and scope analysis.
The corporate use-cases sitting on top of these topics are substantial. A Financial Advisory firm's Legal function may use AI-sourced regulatory scope analysis to define which client segments trigger Consumer Duty obligations, to brief the Compliance function on changes between draft and final rules, or to structure a firm-wide training programme. That analysis feeds into policy documents, board papers, client-facing disclosures, and regulatory submissions. If the AI's answer is wrong at the scope-definition stage, every downstream work-product inherits that error.
The consequences for the firm if an AI-sourced error enters operational use are serious and potentially compounding. The FCA has wide enforcement powers under the Consumer Duty framework, including the ability to impose financial penalties, require remediation programmes, and issue public censures. A firm that has incorrectly scoped which clients qualify as retail customers — for instance, by applying the wrong financial threshold test to charities — may systematically under-apply Consumer Duty protections, exposing it to supervisory action and client redress obligations.
Similarly, a firm whose internal understanding of what changed between CP21/36 and PS22/9 rests on fabricated distinctions may have miscalibrated its implementation, with costs that surface only at the point of an FCA review. The individual employee who consulted the AI tool is not typically personally liable; the firm, its leadership, and its Legal and Compliance functions bear the regulatory and reputational consequences.
Both findings in this case study relate to the FCA's Consumer Duty framework, and together they reveal a consistent pattern: AI tools struggle with the precise legal language that defines regulatory scope and with the granular detail of how rules changed between consultation and final publication. In the first finding, an AI tool substituted "annual income" for the legally defined concept of "annual turnover" when describing the charity threshold under PRIN 2A — a substitution that appears minor but maps to a different legal concept and could produce a different result in borderline cases.
In the second, an AI tool produced a detailed and apparently authoritative account of specific changes made between the consultation paper and the final rules, when the basis for several of those attributed changes cannot be verified and appears to be inference rather than documented fact. The errors are not random noise; they follow a recognisable pattern of AI tools presenting plausible-sounding but legally imprecise or unverifiable content with no visible signal of uncertainty.
Both errors cluster on the same regulation and regulator, which means a Legal team at a Financial Advisory firm that routinely uses AI tools to support Consumer Duty work faces compounding exposure across two of the most consequential types of question: who is in scope, and what exactly does the rule require. These are precisely the questions that anchor a firm's entire Consumer Duty implementation — the scope question determines which clients receive protections, and the CP-to-PS difference question determines whether the firm's implementation was calibrated against the right final-rule text.
Errors in either area are unlikely to remain isolated; they propagate into policies, training, client communications, and regulatory reporting.
The systemic risk is amplified because both types of error are likely to appear credible on first reading. An AI tool that misquotes a defined term by one word, or that presents fabricated draft-versus-final distinctions alongside genuine ones, produces output that a non-specialist user has no obvious reason to question.
For a Legal team operating at pace — supporting multiple business lines, managing regulatory change alongside day-to-day advisory work — the cost of catching these errors falls on whoever in the team has the regulatory depth to spot a substituted term or to know that a specific claimed CP-to-PS change has not been verified against primary sources. If that check is not built into workflow, the error travels silently into firm-wide use.
2 findings in this case study. Click any to see its full evidence card.
The default position for Legal teams at Financial Advisory firms should be that AI tools are a starting point for regulatory research, not a primary source. For Consumer Duty questions in particular — including scope thresholds, defined terms, and the precise content of final rules — AI output should be treated as a prompt for further verification rather than an answer.
Any AI response that cites a specific threshold, a defined term, or a claimed difference between a consultation paper and a final rule should be checked against the FCA Handbook, the relevant policy statement, or finalised guidance before it is used to support a firm work-product. This is not a general caution about AI accuracy: the findings in this case study show that the errors in these specific areas appear credible on first reading and are unlikely to be caught by a reader who does not already know the correct answer.
At the firm level, Legal teams should establish a regulatory-verification policy that explicitly names AI tools as an unreliable source for Consumer Duty scope and rule-content questions, and that requires primary-source sign-off before AI output enters firm-wide use. Where AI output has influenced a policy document, training material, or compliance mapping, the audit trail should record what the AI said, what primary source was used to verify or correct it, and who authorised the final version. "AI-drafted" and "AI-summarised" content should be distinguished in regulatory-facing material, so that reviewers and sign-off holders can apply appropriate scrutiny.
These controls are proportionate to the FCA's enforcement posture under the Consumer Duty, where the regulator has signalled active supervision of implementation quality.
AI tools remain useful in Legal workflows for tasks where the output does not need to be legally precise in ways that carry regulatory risk: drafting non-regulatory internal communications, generating first-draft questions for a research agenda, or producing initial summaries of long documents that the team can then verify against originals. The key distinction is between tasks where an error in the AI's output would travel downstream into a firm decision — and tasks where the team is using AI to save orientation time, with primary-source verification as the next step regardless.
Keeping that distinction explicit, and building it into workflow design, is the most practical safeguard available to Legal teams working at pace in a high-change regulatory environment.
RegLeg publishes hallucination research on a regulation-by-regulation basis, and the findings covering the FCA's Consumer Duty framework are available without charge as a pre-flight check for any Legal team before it relies on AI output in this area. If your team is using AI tools to support Consumer Duty work — whether for scope analysis, policy drafting, training design, or implementation mapping — the published research gives you a clear picture of where AI tools have been observed to fail on this regulation, and what those failures look like.
Teams that have incorporated the research into their AI-use workflow have been able to target verification effort more precisely rather than reviewing AI output uniformly.
For Financial Advisory firms that want a more tailored view, RegLeg offers bespoke regulator deep-dives that map which AI-supported workflows in your specific business model carry the highest hallucination exposure. A Financial Advisory firm's Consumer Duty obligations do not look identical to those of a deposit-taker or an insurer; the workflows where Legal teams are most likely to use AI also differ by firm type.
A tailored mapping exercise produces a prioritised list of the question areas where AI output in your Legal function carries material risk, and where it can be used more freely — giving the team a proportionate and defensible basis for AI use rather than a blanket policy.
RegLeg also offers confidential review of existing AI-use policies against its failure-mode catalogue, with prioritised remediation recommendations. If your firm has already developed an AI-use policy for the Legal function, or if the Compliance function has produced guidance on AI tools, RegLeg can assess whether the policy addresses the specific failure modes observed in Consumer Duty and related FCA regulatory areas, and identify any gaps. In addition, RegLeg can supply CPD-aligned training material that Legal teams can use internally to build the regulatory-verification habits that reduce hallucination risk in day-to-day practice.
All engagements are confidential and are structured to complement, not replace, your firm's existing legal and compliance arrangements.