This case study examines how AI tools perform when Compliance teams at Captive Insurance firms in the United Kingdom ask regulatory questions about obligations arising under the Financial Conduct Authority's Consumer Duty framework (PS22/9 and PRIN 2A). One aggregated question was tested across this regulation, and AI assistants produced incorrect or misleading responses. The errors identified have direct relevance to Compliance functions responsible for scoping regulatory obligations, drafting internal policies, and advising business lines on whether particular activities fall within or outside the Consumer Duty's reach.
Captive Insurance structures introduce specific jurisdictional and product-type considerations that make accurate regulatory scoping especially important — and especially susceptible to AI-generated errors.
Compliance teams at Captive Insurance firms routinely consult AI tools when scoping whether a new product, arrangement, or distribution channel falls within the Consumer Duty's reach. Typical triggers include onboarding a new fronting insurer, restructuring a group insurance programme for a parent company or affiliated entity, extending cover to new members of an existing group policy, or assessing reinsurance treaties that sit alongside retail-facing products.
In each of these scenarios, a Compliance officer may ask an AI tool a threshold question — does the Consumer Duty apply here? — and use the answer to decide whether full Consumer Duty obligations, including the Outcome rules and the cross-cutting rules under PRIN 2A, need to be built into the firm's processes.
These threshold scoping decisions feed directly into downstream work products: regulatory mapping documents used in product governance sign-off, training materials issued to underwriting and distribution teams, due-diligence questionnaires sent to distributors, and board-level horizon-scanning papers. If the AI's answer to a scoping question is wrong — asserting that the Consumer Duty applies to an activity the FCA has expressly carved out, or conversely implying it does not apply when it does — every downstream document that relies on that answer carries the same error forward.
The firm bears the cost of those errors, not the individual who ran the query. If a Compliance team builds a regulatory map on an incorrect AI-generated scope assessment and that map underpins internal policy, product governance sign-off, or a response to an FCA supervisory request, the firm is exposed to regulatory action including public censure, remediation requirements, and financial penalty. Where the incorrect scoping also affects how the firm interacts with policyholders or group scheme members, there may be additional reputational and client-harm consequences that compound over time.
The finding identified in this case study illustrates a characteristic pattern in how AI tools handle regulatory carve-outs: rather than accurately reproducing an express exclusion, the AI over-extends a general principle — in this instance, the distribution-chain logic — beyond the boundaries the regulator has drawn. The FCA explicitly excluded activities connected to the distribution of group insurance policies and the extension of those policies to new members from the scope of the Consumer Duty.
AI tools we tested did not reflect that exclusion; instead they constructed an affirmative case for in-scope treatment, citing sources that do not support the claimed position.
All of the errors identified cluster on the Consumer Duty framework, which is both the most consequential and the most recently enacted major conduct regulation the FCA has introduced. This matters for Captive Insurance Compliance teams because the Consumer Duty is a live obligation with active FCA supervisory focus, and because the captive and group insurance structures most commonly used in this sub-sector sit in precisely the boundary zones — reinsurance, group policies, large commercial risks — where the FCA's express carve-outs apply.
An AI tool that misreads those boundaries will systematically push Compliance teams toward over-compliance or, more dangerously, toward building internal governance frameworks on an incorrect understanding of what the rules actually require.
The systemic risk compounds quickly. A single incorrect scoping answer, embedded in a regulatory mapping document, can propagate into product governance committee papers, board reporting, distribution agreements, and training sign-off records before anyone checks the primary source. At a Captive Insurance firm where the Compliance function is typically lean and the business lines it supports are commercially sophisticated, the pressure to move quickly from AI output to finished work product is high.
That pressure is exactly the condition under which an unverified AI error causes the most damage — not because anyone acted negligently, but because the verification step was assumed to have happened somewhere in the chain when it had not.
1 finding in this case study. Click any to see its full evidence card.
The default position for Compliance teams should be that AI tools are a starting point for orientation — not a primary source — when the question concerns whether a specific activity falls within or outside a regulatory obligation. This is especially true for boundary questions under the Consumer Duty, where express exclusions exist for product types and distribution structures that are common in the Captive Insurance sub-sector.
Any AI-generated scoping answer that touches reinsurance, group insurance distribution, large-risk commercial contracts, or connected activities should be verified against the FCA's published policy statement and the relevant PRIN 2A handbook text before it enters any work product. Treat the AI's answer as a prompt to check the source, not as a substitute for checking it.
At the firm level, the Compliance function should establish a clear internal policy that designates AI tools as unreliable sources for regulatory scoping determinations in these areas, and requires a named reviewer to confirm primary-source alignment before any AI-generated regulatory position is used in governance documents, board papers, product approval records, or distributor-facing materials. Where AI output has influenced a work product, the audit trail should record both the AI's answer and the primary-source check that confirmed or corrected it.
This is not an unusually high bar — it is the same discipline that good Compliance practice already requires for any secondary summary of regulatory text — but it needs to be made explicit for AI outputs because the fluency and apparent confidence of AI responses can suppress the instinct to verify.
AI tools remain genuinely useful in the Compliance workflow for tasks that do not require the AI's output to be treated as accurate regulatory text: drafting non-regulatory internal communications, generating first-draft question sets for further primary-source research, summarising long documents that the team will then verify independently, and producing structured outlines for policies that a qualified reviewer will complete. The discipline is to be clear, at the point of use, which category the task falls into — and to ensure that outputs from the first category never migrate into the second without the verification step being completed and recorded.
RegLeg's published Hallucination Research is available as a free reference tool for Compliance teams who want to check, before relying on an AI-generated answer, whether that answer covers a topic area where AI tools are known to produce inaccurate responses. For Captive Insurance Compliance functions, the research covers the Consumer Duty in detail — including the boundary questions around group insurance distribution, reinsurance, and large-risk commercial exclusions that are most relevant to typical captive structures.
Teams can use the published findings to sense-check AI outputs before those outputs enter work products, and to brief senior stakeholders on which regulatory topic areas carry elevated AI-reliability risk.
For firms that want a more tailored assessment, RegLeg offers bespoke deep-dives that map the specific AI-supported workflows used by a Captive Insurance Compliance function against the failure-mode patterns identified in the research. This includes identifying which internal processes — product governance, regulatory mapping, training material production, distributor due-diligence — carry the highest exposure, and what verification checkpoints would most efficiently reduce that exposure. The output is a practical risk map the Compliance team can use to prioritise where additional human review is needed without adding disproportionate overhead to every AI-assisted task.
RegLeg also offers confidential review of a firm's existing AI-use policy or AI-governance framework against the RegLeg failure-mode catalogue, with prioritised remediation recommendations. For Compliance teams that have already begun using AI tools in regulatory workflows but have not yet formalised their verification and sign-off procedures, this provides a structured way to identify gaps and close them before they become supervisory findings.
Training materials and CPD-aligned content are also available for teams that want to build AI-literacy into their Compliance function's professional development programme — covering both the practical use cases where AI adds value and the specific categories of regulatory question where it should not be trusted without independent verification.