AI Hallucination ResearchRegulatorsMajor advanced economiesUSCFTCFCM-DCO-CUSTOMER-FUNDS-INVESTMENTS-REG-1-25-2024 › White paper
AI Labs · updated 2026-06-04 · methodology v2.1

Regulation 1.25 Amendment Failures: Schema Substitution and Procedural Confabulation on CFTC Customer Fund Investment Rules

Executive summary

Both Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search failed on the same question type — a size-tiered concentration limit embedded in the 2024 CFTC Amendments to Regulation 1.25 — producing flat uniform-threshold answers where the rule specifies a two-condition tier keyed to fund asset size and management company AUM. Across five confirmed failures on this regulation, the dominant pattern is the model confidently reconstructing rule parameters from a prior-version or generic regulatory schema rather than retrieving the amended text's specific numeric structure. A secondary failure involves procedural process: Claude Opus 4.7 with web search fabricated an open Commission meeting with a named presiding chair, when the record shows the rule was approved by seriatim vote. These are not retrieval misses on obscure content — they are over-confident confabulations on the most decision-critical parameters a compliance team would query a model to confirm.

Findings — impact summary

This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.

  1. Finding on 'Q001 Probe' for Claude Opus 4.7 with web search ONRLB-H-US-CFTC-FCM-DCO-CUSTOMER-FUNDS-INVESTMENTS-REG-1-25-2024-Q001-Opus47

    This finding implicates the training corpus's representation of the final rule text relative to pre-final secondary commentary. The model asserted that the final rule rejected size-based tiering — a claim that reflects early rulemaking commentary, not the published final rule. The retrieval pipeline failed to surface the primary rule text at sufficient specificity to override the trained-schema prior, suggesting the Federal Register notice is underweighted relative to secondary sources in retrieval ranking for this query type.

    see details →
  2. Finding on 'Q002 Probe' for Claude Opus 4.7 with web search ONRLB-H-US-CFTC-FCM-DCO-CUSTOMER-FUNDS-INVESTMENTS-REG-1-25-2024-Q002-Opus47

    This finding implicates cross-regulator concept conflation in the training data. The model correctly retrieved the 24-month WAM ceiling but appended SEC Rule 2a-7's computation methodology as an elaboration — a cross-contamination that occurs when two regulators' frameworks share vocabulary and are trained in close proximity without independence annotations. The error is in the generation layer: the model added a plausible-sounding methodology reference not present in the retrieved CFTC rule text.

    see details →
  3. Finding on 'Q004 Probe' for Claude Opus 4.7 with web search ONRLB-H-US-CFTC-FCM-DCO-CUSTOMER-FUNDS-INVESTMENTS-REG-1-25-2024-Q004-Opus47

    This finding implicates calibration on compliance-deadline specificity. The rule publishes a fixed calendar date (March 31 2025) for the SIDR update deadline; the model substituted a relative range drawn from pre-final-rule commentary. The retrieval pipeline either did not surface the compliance calendar from the final rule text, or the model discounted it in favour of a higher-frequency secondary framing. Post-training calibration should penalise relative-range answers on compliance deadline questions where a fixed date is retrievable from primary source.

    see details →
  4. Finding on 'Q005 Probe' for Claude Opus 4.7 with web search ONRLB-H-US-CFTC-FCM-DCO-CUSTOMER-FUNDS-INVESTMENTS-REG-1-25-2024-Q005-Opus47

    This finding implicates confidence calibration on procedural claims where retrieval signal is thin. The model fabricated a specific open Commission meeting with a named presiding chair on a precise date — all plausible, none accurate. When web search does not surface explicit procedural-record documentation (meeting minutes, Commission vote transcript, Federal Register preamble on the approval mechanism), the model should fall back to uncertainty rather than constructing an institutional narrative from learned templates. This is a calibration gap in the generation head, not a retrieval gap.

    see details →
  5. Finding on 'Q001 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-US-CFTC-FCM-DCO-CUSTOMER-FUNDS-INVESTMENTS-REG-1-25-2024-Q001-Sonnet46

    The convergence with Claude Opus 4.7 on the same concentration-tier question makes this finding structurally significant: two different model architectures, same configuration, same wrong schema, same explicit denial of the tier's existence. The internal contradiction in the response — denying a size-based tier while using tier-like vocabulary — suggests the model assembled its answer from a regulatory schema template rather than retrieved primary text. This points to a retrieval-ranking issue shared across model families: the Federal Register final rule text is not being surfaced at sufficient authority weight to override the trained prior on this query.

    see details →
← Other AI Labs white papers The detailed Case study →