AI Labs · updated 2026-06-03 · methodology v2.1

PFMI Principle 15 Failures: Conditional-Structure Fabrication and Carve-Out Denial in Claude Opus 4.7 and Claude Sonnet 4.6

Executive summary

Both Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search produced failures on CPMI-IOSCO's Implementation Monitoring of the PFMI: Level 3 Assessment on General Business Risks (Bank for International Settlements, November 2025) that share a common shape: the models reconstructed rule conditions from internalized schema rather than from the regulator's published text, generating structurally plausible but materially wrong formulations of the standard's quantitative requirements. The six confirmed failures across both models converge on the LNAFE minimum structure under PFMI Principle 15 Key Consideration 3, the Basel/CRD capital-counting carve-out within the same provision, and institutional attribution for the assessment's co-governance structure. When web search is enabled, neither model resolved these gaps through retrieval; in several cases, sourcing worsened the output by introducing third-party paraphrases that diverged from the regulator's verbatim text. The pattern signals a systematic gap in how both model configurations handle the intersection of technical regulatory numerics, conditional qualifications within formally structured standards, and recent official publications that fall at or past the retrieval pipeline's effective indexing boundary.

Findings — impact summary

This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.

Finding on 'Q001 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q001-Opus47
This failure implicates the retrieval pipeline's handling of institutional personnel records in recent BIS-IOSCO publications: the model recognised that the co-chair information existed in a specific document but could not retrieve it, then chose confident source-deflection over an explicit retrieval failure signal. The subsystem gap is citation generation — the model produced a Pretextual citation (third-party commentary) while simultaneously telling the user to consult the primary document, a behaviour that masks retrieval failure rather than surfacing it for the user to act on.
see details →
Finding on 'Q002 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q002-Opus47
This failure implicates training-data representation of PFMI Principle 15's Key Consideration structure: the model generated a two-part compound condition drawing on real concepts from adjacent Key Considerations (KC4 liquidity, cross-Principles non-duplication) and applied them to KC3 in a way the standard does not support. The subsystem gap is verbatim-constraint anchoring — the model's schema for how this provision works overrode the regulator's actual published language, producing a materially more restrictive rule that does not exist.
see details →
Finding on 'Q003 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q003-Opus47
This failure implicates training-data anchoring on PFMI Principle 15 KC3's quantitative minimum: the model generated a 'greater of' compound floor where the regulator's published text states a single flat floor. The compound structure may have been synthesised from scenario-analysis language in adjacent Key Considerations. The subsystem gap is single-floor vs. compound-floor discrimination — the model's generalised schema for regulatory capital minimums produced a more complex formulation than the rule requires, expressed with high apparent confidence.
see details →
Finding on 'Q002 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q002-Sonnet46
This is a high-consequence failure for compliance-context deployment: the model issued a categorical denial of a provision that exists verbatim in the published standard, framed as an authoritative policy note for a CCP capital management team. The subsystem gap is training-data representation of the KC3 carve-out combined with post-training calibration on categorical denial — a high-confidence 'does NOT include' assertion on a regulator-specific provision should trigger a higher uncertainty signal than the model produced here.
see details →
Finding on 'Q003 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q003-Sonnet46
This failure implicates the model's cross-reference resolution within the PFMI Principle 15 Key Consideration list: the correct threshold was located but attributed to KC2 instead of KC3. The subsystem gap is structured-document KC-number-to-provision linkage in training data — the model's Annex A representation does not reliably bind specific quantitative requirements to their correct KC identifier. The Pretextual citation (third-party commentary) used as a sourcing basis for this section of the response compounds the error.
see details →
Finding on 'Q005 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q005-Sonnet46
This failure implicates the retrieval pipeline's indexing boundary for BIS-IOSCO assessment publications: the model reproduced a 2023–2024 window for the assessment process while the published document specifies a 2023–2025 window with explicit April 2025 follow-up engagement dates. The subsystem gap is indexed-content completeness for Q4 2025 BIS publications — the model returned the portion of the timeline available in its indexed content without uncertainty-flagging that the more recent period might be missing from its view.
see details →

← Other AI Labs white papers The detailed Case study →