AI on Implementation Monitoring of the PFMI: Level 3 Assessment on General Business Risks for Public Auditors in international jurisdictions

Executive Summary

Across the three questions put to AI assistants on Principle 15 of the PFMI — the general business risk standard's liquid net assets funded by equity requirements — every answer contained a structural misreading of the Key Consideration architecture. AI tools consistently collapsed or transposed the distinct obligations in KC2, KC3, and KC4, producing hybrid constructs that do not correspond to any actual provision in the text.

The dominant pattern is confident initial delivery of a fabricated rule, followed by retraction or contradiction when challenged — an instability that should concern any public auditor relying on AI output to inform sign-off positions or advisory opinions on FMI compliance. For public auditors engaged in L3 assessment work, ISDA/FIA response analysis, or CCP internal audit mandates referencing this standard, the errors cluster precisely on the quantitative compliance thresholds and eligibility conditions that matter most to a regulatory opinion.

How AI gets this regulation wrong

Every failure on this regulation followed the same arc: AI tools gave an answer with apparent precision, then either reversed it or contradicted themselves when the question was pressed. The underlying error in each case was not vagueness but structural invention — AI tools merged obligations from separate Key Considerations into artificial composite rules, or reassigned specific quantitative floors between KCs, producing responses that sound authoritative but describe requirements that do not exist in the text.

AI's Failure Mode	Count	Affected findings
Exposed Fabrication	2	Finding#1 · Finding#2

What that means for your practice

For public auditors, all three failures carry the same practical consequence: a deliverable built on a misread standard. Whether the output is an advisory opinion on a CCP's LNAFE buffer methodology, an L3 assessment gap analysis, or an internal audit finding on compliance with the six-month floor, an AI-sourced misattribution of which Key Consideration governs which obligation will not survive contact with the source text — but may reach a client or regulator before that check is made.

Risk Impact	Count	Affected findings
Wrong deliverable	2	Finding#1 · Finding#2

When this affects Public Auditors

Public auditors most commonly reach for AI assistance on this regulation when scoping an FMI audit or internal review that touches the general business risk framework — specifically when they need to confirm the precise LNAFE floor, understand how the Basel/CRD capital carve-out interacts with the equity eligibility test, or map an FMI's stated methodology against the specific KC obligations.

The same need arises when reviewing management representations about compliance with the six-month minimum, assessing whether a CCP's scenario-analysis approach is drawing on the right KC as its source authority, or preparing a gap note ahead of an L3 assessment engagement.

The exposure is sharpest when AI output is used to frame the compliance threshold itself — not as background context but as the operative standard. If an auditor accepts a fabricated "greater of" dual-track minimum as the KC3 floor, or treats a KC4 liquidity test as the governing condition for Basel equity inclusion, the resulting opinion or sign-off misstates the actual regulatory requirement. In an international jurisdiction context, that error may not be caught by a domestic-law overlay — the PFMI text is the primary reference, and the KC architecture is the relevant unit of analysis.

The November 2025 CPMI-IOSCO L3 assessment publication makes this particularly live: auditors and advisers reviewing the findings document or advising clients responding to consultation outcomes need to accurately locate which KC governs which obligation, since the assessment findings reference that architecture directly. An AI that reassigns the six-month floor from KC3 to KC2, or that invents a scenario-analysis leg inside KC3, will produce confusion that propagates into client briefings, response drafts, and audit committee presentations.

The findings at a glance

The table below summarises each finding — the question area, the AI's error, and the risk consequence — across the three Principle 15 queries tested on this regulation.

#	Finding title	Type	Citation ID
1	KC3 Basel carve-out condition fabricated with KC4 liquidity test	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q002
2	Six-month LNAFE floor inflated into invented dual-track minimum	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q003

Aggregate impact

All three findings concern Principle 15 KC3 — the LNAFE requirement — and together reveal that AI tools have a systematically distorted model of how the Key Considerations in Principle 15 are structured. The specific errors differ in form but share the same root: AI tools appear to treat KC2 (scenario-based sizing), KC3 (quantitative floor and equity eligibility), and KC4 (asset quality) as interchangeable or combinable, freely reassigning obligations between them. One AI invented a "greater of" dual-track floor that imports KC2's scenario-analysis sizing into KC3.

Another denied that KC3 contains a Basel capital carve-out at all, then retracted on challenge. A third attributed the six-month floor to KC2 rather than KC3 — and maintained that attribution even when pressed.

The pattern is not random noise. It suggests that AI tools trained on secondary commentary, assessment reports, or implementation guidance may have absorbed a simplified mental model of Principle 15 in which the quantitative and qualitative obligations are loosely associated rather than precisely allocated to specific KCs. For a practitioner audience that operates at the level of KC-specific compliance analysis, this is a fundamental failure: the KC architecture is not decorative — it determines which obligation a particular FMI representation speaks to, which assessment criterion applies, and which remediation pathway is available.

For international public auditors, the aggregate risk is that AI-assisted drafting will produce technically fluent prose that mislocates the governing standard, generating opinions or gap analyses that would not survive a regulator's textual review. The instability of the AI responses — confident on first delivery, retracted or contradicted when challenged — means that a practitioner who does not independently verify against the PFMI source text has no reliable signal that the answer is wrong.

What your team should do

The default position for any team member using AI to research Principle 15 KC obligations should be: AI output on this regulation cannot be trusted without direct verification against the PFMI text, specifically the KC-by-KC breakdown in Principle 15. This applies regardless of how precise or well-sourced the AI response appears — the failure pattern here is confident fabrication, not hedged uncertainty, and the errors concern the specific provisions (the six-month floor, the Basel carve-out condition, the asset eligibility test) that an auditor or adviser is most likely to rely on directly.

Practically, this means every KC attribution in a draft opinion, gap analysis, or audit finding should carry a source citation to the PFMI text itself, not to AI-generated summaries or secondary commentary. Where AI tools are used for initial drafting speed, the KC references should be treated as placeholders requiring mandatory verification — not as confirmed regulatory text. For engagement scoping or client briefings that reference the November 2025 L3 assessment, the same discipline applies: the assessment document references specific KC obligations, and any AI summary of those findings should be checked against the source.

AI tools are genuinely useful on this regulation for tasks that do not depend on KC-level precision: understanding the general business risk framework's purpose, identifying which FMI types are in scope, or summarising the policy context for a client unfamiliar with the PFMI architecture. They are also useful for drafting surrounding prose — contextual sections, executive introductions, or policy background paragraphs — where a KC-attribution error would not propagate into a compliance conclusion.

The bright line is anywhere an AI response is being used to characterise a specific obligation, threshold, or eligibility condition under a named Key Consideration: at that point, the PFMI text is the only reliable source.

How RLB Can Help

RegLeg's published Hallucination Research gives public auditors a practical pre-flight check before placing weight on AI-assisted analysis of regulatory questions. The research catalogues the specific failure modes — misquoted thresholds, conflated jurisdictional requirements, fabricated citation trails — that AI tools produce most often in public-sector and cross-border audit contexts. Auditors can use these findings to calibrate their review steps before any AI output enters a working paper, providing a documented basis for the professional scepticism their standards already require.

Where an audit team or firm has multiple practitioners working across the same regulatory portfolio, RLB can deliver bespoke deep-dives on individual regulations. These sessions go beyond the published research to map failure modes specific to the instruments, guidance notes, and enforcement expectations most relevant to the team's current engagements. The output is practical rather than theoretical: teams leave with concrete review checkpoints aligned to the regulations they are actually auditing against.

RLB also develops training material and CPD-aligned content built around the failure-mode catalogue, so that auditors at all experience levels understand what to look for and why. For firms that have already deployed AI tools and drafted internal use policies, RLB offers confidential reviews of those policies against the same catalogue — identifying gaps between what the policy assumes AI tools will do reliably and what the research shows they frequently get wrong.