AI on Implementation Monitoring of the PFMI: Level 3 Assessment on General Business Risks for Risk teams at Management & Risk Consulting firms in international jurisdictions

Executive Summary

Risk teams at Management & Risk Consulting firms advising FMI clients on PFMI Principle 15 general business risk face a specific and consequential failure pattern when using AI tools: every AI failure in this cell involves the liquid net assets funded by equity (LNAFE) requirements, and in every case the AI produced a structurally incorrect answer, then — when pressed — retracted or contradicted itself.

Across three tested questions, AI tools consistently cross-contaminated Key Consideration 2 and Key Consideration 3 under Principle 15: inventing a "greater of" dual-track minimum that does not exist in KC3, inserting a KC4 liquidity test as a qualifying condition for the Basel capital carve-out, and in one instance flatly denying that KC3 contains a Basel carve-out at all.

The November 2025 CPMI-IOSCO Level 3 Assessment is a live supervisory document against which regulators will hold FMIs to account; consulting firms that brief CCP or payment-system clients on Principle 15 compliance using AI-generated LNAFE analysis risk shipping policy architecture that does not match the rule text — an error with direct remediation and reputational consequences.

How AI gets this regulation wrong

The failures on this regulation are not scattered across Principle 15 — they concentrate entirely on how AI tools represent the structure and content of the LNAFE Key Considerations, specifically the boundary between KC2 and KC3. AI tools repeatedly fabricated conditions by importing obligations from adjacent Key Considerations, denied provisions that appear verbatim in the rule text, and then reversed position under challenge — the classic pattern of a model that has reorganised the regulatory architecture in its internal representation and cannot reliably distinguish which quantitative test lives where.

AI's Failure Mode	Count	Affected findings
Exposed Fabrication	2	Finding#1 · Finding#2

What that means for your team

Every finding in this cell lands in the same risk category: wrong deliverable. The AI failures here do not manifest as vague or unhelpful answers — they produce technically coherent, confidently framed LNAFE analysis that will pass a junior review precisely because the errors are structural rather than obvious. For a Risk team at a consulting firm, that means the contamination travels into client-facing policy gap assessments, Principle 15 implementation roadmaps, and L3 assessment readiness reports before anyone has reason to check the KC-level text.

Risk Impact	Count	Affected findings
Wrong deliverable	2	Finding#1 · Finding#2

When this affects your department

The typical entry point is a CCP or payment-system client preparing for a CPMI-IOSCO L3 assessment or conducting an internal Principle 15 gap analysis against the November 2025 findings. The Risk team's deliverable — whether a policy memo, a capital adequacy benchmark, or a board-level readiness briefing — will almost certainly require a precise characterisation of the LNAFE floor and the conditions under which Basel-compliant regulatory capital can count toward it. AI tools are a natural accelerant here: the questions are technical, the rule text is dense, and junior consultants under time pressure will reach for a fast answer.

The problem is that the specific technical questions where AI tools fail most spectacularly are exactly the ones that look most answerable — how much LNAFE must an FMI hold, how is it calculated, what qualifies.

The structural risk for the consulting firm is that an incorrect LNAFE characterisation embedded in a client deliverable is not caught at review unless a reviewer independently checks the KC-level text. A "greater of" framing that imports KC2's scenario-analysis test into KC3's simple six-month floor sounds like conservative risk management — it is plausible on its face and will not trigger a red flag in a normal peer review cycle. Similarly, a KC attribution error (attributing the six-month minimum to KC2 instead of KC3) is invisible in a deliverable unless the reviewer already knows the Principle 15 structure cold.

Both errors are exactly what the L3 assessment process is designed to surface in FMIs — and consulting firms that have quietly replicated those errors in their own advice are exposed when the regulator's findings land.

For firms advising clients across multiple jurisdictions — particularly those with CCPs or CSDs operating in post-trade infrastructure subject to CPMI-IOSCO standards — the reputational stakes compound. A single wrong LNAFE floor in a capital policy memo can cascade into a remediation programme, a supervisory escalation, or a required restatement of the FMI's general business risk framework. The consulting firm's liability depends on engagement terms, but the reputational cost of having produced structurally incorrect Principle 15 advice on a regulation that was actively under Level 3 supervisory review is not recoverable by pointing to AI-assisted drafting.

The findings at a glance

All three findings involve AI tools mischaracterising the LNAFE requirements under Principle 15 — differing in which specific KC boundary was crossed, but consistent in producing structurally incorrect output that the AI then partially or fully retracted under challenge.

#	Finding title	Type	Citation ID
1	KC3 Basel carve-out replaced with invented KC4 liquidity test	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q002
2	LNAFE floor fabricated as 'greater of' dual-track minimum	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q003

Aggregate impact

The failures in this cell are not distributed across Principle 15 — they cluster entirely on the KC2/KC3 boundary and the internal structure of the LNAFE requirement. This is not a coincidence of sampling; it reflects how AI tools appear to have compressed or reorganised the Principle 15 Key Considerations in their internal representation of the PFMI.

The model treats KC2 (scenario-analysis sizing) and KC3 (the six-month floor, the Basel carve-out, the segregation obligation) as a single merged provision, which means any question that probes the precise boundary between them — how much must be held, under what exact conditions, under which KC — elicits an answer that blends the two.

The systemic implication for a Risk function advising multiple FMI clients is that the contamination is not finding-specific: it is a structural property of how AI tools represent this part of the PFMI. Every AI-assisted Principle 15 LNAFE analysis produced without direct verification against the KC-level text carries this risk, whether the question is about the minimum floor, the Basel carve-out conditions, or the asset qualification criteria.

In a practice with multiple client engagements running in parallel — each with a Principle 15 gap item in scope — the aggregate exposure is proportional to how widely AI-generated first-draft content is used without a mandatory KC-text verification step.

The retraction behaviour documented across all three findings adds a further operational risk: AI tools that reverse position under challenge appear to self-correct, but the reversal itself signals that the initial answer was not grounded. In a workflow where a junior analyst queries the AI, receives a confident answer, and does not press the question, the retraction never happens — and the wrong answer is the one that makes it into the deliverable.

What your team should do

The default position for Principle 15 LNAFE work is: AI tools are not a reliable source for KC-level attribution or for the precise conditions attached to any LNAFE qualifier. Treat AI-generated characterisations of KC2 and KC3 as first-pass research only, and require a direct check against the PFMI text before any KC-level claim enters a client deliverable. This applies regardless of whether the AI answer looks internally coherent — the errors documented here are structurally coherent wrong answers, not obviously garbled responses.

The practical safeguard for the Risk team is a mandatory KC-citation rule: any internal or client-facing document that makes a claim about what Principle 15 requires on LNAFE must include an explicit citation to the specific KC, and that citation must have been verified against the published PFMI text, not an AI summary of it. Embed this in the engagement quality-check template for any Principle 15 scope item, and flag it specifically in peer review for junior-authored sections.

Given that the L3 assessment is a live supervisory document, the bar for accuracy on KC-level claims is not "plausible on its face" — it is "verifiable against the text the regulator used to assess FMIs."

AI tools are safely useful in the adjacent research tasks: identifying which FMIs were assessed in the November 2025 exercise, retrieving the timeline of CPMI-IOSCO L3 assessments across different Principles, or locating industry association responses such as the FIA/ISDA consultation submission. For contextual background — what the Level 3 process is, how assessments are structured, what kinds of findings typically emerge — AI provides reasonable first-pass framing. The ceiling drops sharply when the question becomes specific to KC-level content, quantitative thresholds, or qualifying conditions.

That is where the text of the PFMI and the November 2025 assessment report must be read directly.

How RLB Can Help

RegLeg's published Hallucination Research gives your Risk team a concrete pre-flight check before placing weight on AI output for regulatory questions. Before a deliverable goes to a client or an internal sign-off, cross-referencing it against the live findings catalogue tells you whether the specific regulation, jurisdiction, or instrument in scope has already produced documented failure patterns — wrong obligation scope, inverted thresholds, misattributed enforcement dates. That is not a theoretical risk register entry; it is an empirical map of where AI tools have already broken on the exact material your team touches.

Beyond the published research, we work with Risk functions directly to trace which AI-supported workflows in your practice carry the highest hallucination exposure. Not generically — specifically: where in your regulatory horizon-scanning, client risk-framework reviews, or cross-border gap analysis does AI output feed into a judgment that moves without further verification? That mapping exercise produces a ranked exposure list your team can act on, not a taxonomy exercise.

We then run the same logic against your firm's existing AI-use policy to identify where the stated guardrails do not match the failure modes we have documented, and we deliver prioritised remediation in terms your risk governance process can absorb — policy language, escalation triggers, and workflow checkpoints.

For teams that need to build internal capability, we produce training material and CPD-aligned content calibrated to the Risk function's actual day-to-day — not generic AI-literacy material, but technically grounded content anchored to real failure cases in the regulatory domains your practice covers. The goal is to leave your team able to interrogate AI output independently, not to install a dependency.