AI Labs · updated 2026-05-31 · methodology v2.1

BBNJ High Seas Biodiversity Agreement: Model Hallucination Findings

This paper presents findings from RegLeg's hallucination research on the Agreement under the United Nations Convention on the Law of the Sea on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction (2023), administered by the United Nations Treaty Collection (Office of Legal Affairs, Treaty Section). Two models were tested — Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search — across a range of detailed questions about the Agreement's operative provisions, geographic scope, and procedural records.

Across 13 findings, the dominant error pattern was misstatement of the Agreement's transitional provisions: both models independently inverted the default rule on retroactivity, describing a regime that applies to pre-entry-into-force collections when the Agreement explicitly does not. Secondary patterns include incorrect article attribution for key provisions and, in one case, confusion about whether the Agreement's non-undermining clause originates in Article 8 or Article 22.

These errors are material because the Agreement is new law — it entered into force in September 2025 — and practitioners and institutions relying on model-generated summaries of its provisions are unlikely to have ready access to the primary text for cross-checking.

When this affects AI Labs

The BBNJ Agreement is immediately relevant to a widening set of users who will query frontier models for legal, compliance, and strategic guidance: biotech and pharmaceutical companies with marine research programmes, shipping operators navigating new area-based management designations, environmental law practices, multilateral-development-bank teams financing high-seas infrastructure, and government delegations preparing for the Agreement's first Conference of the Parties.

Each of these users is likely to treat model output on the Agreement's provisions as a starting point for real decisions — whether to obtain benefit-sharing licences, whether a planned transit route is affected, or whether a collection programme needs to be restructured. When a model confidently inverts the Agreement's retroactivity default — telling users that pre-entry-into-force collections are covered when the text says the opposite — the downstream consequence is mispriced compliance risk for practitioners who act on that answer.

For an AI lab, the exposure runs in two directions. First, if a corporate compliance team structures a benefit-sharing strategy around a model's wrong statement of Article 10 scope and later faces a regulatory challenge, the lab's terms of service will not insulate it from reputational fallout or, in some jurisdictions, from civil claims framing the model output as negligent advice.

Second, the errors observed here — including citation of URLs that resolve to third-party law-firm summaries rather than primary treaty text, and at least one fabricated Brill chapter URL — represent a class of failure that red-team and eval coverage typically under-indexes on: newly-in-force multilateral instruments where the primary text is accessible but secondary commentary is voluminous, uneven in quality, and well-indexed by web search.

This Agreement's structure makes it particularly susceptible. It is a dense 76-article instrument accompanied by an Annex and an Implementation Agreement structure; its operative provisions contain multiple carefully negotiated qualifications and opt-out mechanisms that are easy to paraphrase away. The benefit-sharing framework for marine genetic resources distinguishes between collection activity, utilisation, and digital sequence information across separate articles with non-obvious cross-references.

The Agreement also entered into force recently enough that the model training corpus is likely to include extensive pre-ratification negotiating commentary and early academic interpretation, some of which was written before final text was agreed — making content-drift between "draft text as described" and "final text as adopted" a live failure surface.

Aggregate impact

Model	Configuration	Findings	Dominant failure pattern
Claude Opus 4.7	Web search enabled	3	Two of three findings show the model attributing provisions to wrong articles or inverting a treaty default, while the third finding shows appropriate calibrated uncertainty about a specific procedural reference number — but then citing only third-party secondary sources rather than the primary UN depositary record.
Claude Sonnet 4.6	Web search enabled	10	Five of ten findings were substantively correct; the remaining five show the model dropping critical qualifiers on treaty thresholds, inverting retroactivity defaults, misattributing the DSI benefit-sharing article, understating the ocean-coverage characterisation used in official statements, and citing a fabricated URL in one instance.

Claude Opus 4.7 with web search produced two findings of material concern: in both cases the model misstated which article of the Agreement governs a key provision, and in the retroactivity finding it stated the opposite of the rule the Agreement actually sets out. The third Opus 4.7 finding is notable in a different way — the model correctly expressed uncertainty about a specific depositary notification number but still cited two third-party secondary sources rather than attempting to retrieve the primary UN Treaty Collection record.

That pattern — calibrated hedging on the conclusion, yet citation of secondary sources as if they were primary authority — is a signal that retrieval routing for treaty-registry queries is not reliably directing the model toward the authoritative source.

Claude Sonnet 4.6 with web search shows a more mixed picture. A majority of its findings were substantively correct, but the errors that did occur are concentrated in the same territory as Opus 4.7's failures: retroactivity of the marine genetic resources framework, and article attribution. Both models independently inverted the retroactivity rule — Sonnet 4.6 going further by stating an explicit opt-out mechanism that reverses the actual structure of the Agreement.

The joint pattern suggests the error is not model-specific: it is likely being driven by commentary written before the final text was settled, which described an earlier draft regime that was subsequently reversed during negotiations. An alignment team investigating this error should focus on the relationship between the model's training corpus and pre-adoption negotiating documents, rather than treating it as a calibration problem unique to one configuration.

Findings

13 findings in this case study. Click any to see its full evidence card.

Finding on 'Q003 Probe' for Claude Opus 4.7 with web search ON see this finding →
Finding on 'Q005 Probe' for Claude Opus 4.7 with web search ON see this finding →
Finding on 'Q011 Probe' for Claude Opus 4.7 with web search ON see this finding →
Finding on 'Q001 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q002 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q003 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q004 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q005 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q008 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q009 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q010 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q011 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Finding on 'Q012 Probe' for Claude Sonnet 4.6 with web search ON see this finding →

What your team should do

Implications for your training data

The most material training-data finding is the retroactivity inversion: both Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search independently described the Agreement's default as retroactive with a prospective opt-out, when the Agreement is prospective by default with an optional retroactive extension. This error is almost certainly being driven by pre-adoption commentary — legal blog posts, academic articles, and NGO briefings written during the 2022–2023 negotiating rounds described earlier draft proposals that did include a retroactive default.

That commentary is well-indexed and voluminous; the final adopted text that reversed the default is narrower in footprint and more recent. Your training corpus for this instrument likely contains far more text describing the draft rule than text accurately describing the final adopted rule. The fix is not just including more post-adoption text — it is weighting the corpus to de-prioritise pre-adoption commentary that describes provisions as they were debated rather than as they were adopted, or explicitly labelling such commentary as draft-stage.

The article-attribution errors (Article 8 vs Article 22(2) for the non-undermining clause; Article 15.5 vs Article 14(1) for digital sequence information benefit-sharing) suggest that structured extraction of the treaty's article-by-article provision map has not been applied to this instrument. The Agreement is organised in a way that does not match how legal commentary typically summarises it — secondary sources often group provisions by topic rather than by article, and models trained on those summaries inherit the topic grouping without the article-level precision.

Definitions-table and article-map extraction for multilateral instruments should be a structured part of corpus ingestion, not an inference from free-text summaries.

Implications for your post-training logic

Both models with web search enabled consistently cited third-party law-firm summaries and academic journal abstracts rather than the primary treaty text at treaties.un.org. The UN Treaty Collection is machine-accessible and provides the full agreement text in multiple languages; the retrieval ranker should weight queries that include treaty or agreement names heavily toward official UN portals over third-party summaries. Currently the ranker appears to treat a Global Policy Watch summary or a Tandfonline abstract as equally authoritative for a question about specific article text — that default needs adjustment for treaty-law query patterns.

The fabricated Brill chapter URL (finding 5 of Claude Sonnet 4.6) occurred in a response whose substantive content was correct. This is a common pattern where citation generation is operating semi-independently of content generation: the model produces an accurate answer and then generates a plausible-looking citation that is not real. For regulator-domain queries, add a post-generation validation pass that checks whether cited URLs resolve and whether the content at the resolved URL is substantively related to the claim being cited.

Where a citation fails either check, the model should either drop the citation or flag it as unverified rather than presenting it as a source.

Specific eval / red-team probes RegLeg suggests

Retroactivity and transitional provisions: probe whether the model correctly states the Agreement's prospective default and the structure of the retroactivity opt-in — specifically whether it can distinguish the final adopted text from pre-adoption draft descriptions.
Article-level precision: probe whether the model can correctly attribute specific obligations (non-undermining clause, DSI benefit-sharing, EIA screening threshold) to the correct articles of the Agreement, rather than providing correct substantive summaries with wrong article citations.
Retrieval sourcing for treaty text: probe whether, with web search enabled, the model directs treaty-text queries to the official UN Treaty Collection portal rather than secondary aggregators and law-firm summaries.
Citation verifiability: probe whether the model generates verifiable citations when answering questions about specific treaty provisions — particularly for questions where the answer is correct but confidence is high enough to invite hallucinated sourcing.
Official characterisation fidelity: probe whether the model reproduces official UN characterisations of the Agreement's scope (e.g. "more than two-thirds of the ocean") accurately rather than substituting adjacent but directionally different formulations.

How RLB can help

RegLeg's research on the BBNJ Agreement is part of a broader programme covering multilateral instruments, financial regulators, and sector-specific regulatory bodies across multiple jurisdictions. For this Agreement specifically, the failure surfaces we have identified — draft-vs-adopted text divergence, article-attribution gaps, and retrieval-source ranking for treaty queries — are tractable problems that benefit from structured regulatory domain knowledge. We can work with your evals and post-training teams to close those gaps in a systematic way.

The primary partnership track we offer AI labs is licensed access to the full question bank under a mutual NDA. The questions have been designed to surface hallucination-prone areas in each instrument's operative provisions — they are not general-comprehension questions but targeted probes at the specific failure surfaces we have empirically identified. Paired with our regulatory specialists' annotation of correct answers against the primary treaty text, this gives your team a ready-made eval dataset for this regulation.

We can also generate synthetic correction pairs — question + wrong model answer + correct authoritative answer with article citation — derived directly from the regulator's text, for use in fine-tuning or RLHF datasets. These are built from the primary instrument, not from secondary commentary, which means they are anchored to the same standard we hold models to in our research.

For labs that want ongoing coverage, we offer an embedded eval track: quarterly refresh of question banks and correction pairs as the regulatory landscape evolves — new depositary notifications, Conference of the Parties decisions, implementing agreements, and amendment records. The BBNJ Agreement will be a living instrument for years; its implementing rules, benefit-sharing mechanism, and area-based management designations will generate new compliance-relevant text on a regular cadence. Quarterly refresh means your models' eval coverage tracks the regulation as it develops, not just the base text at adoption.

We are also available for direct red-team consultation on regulator-specific failure surfaces ahead of model releases, if your team wants a domain-expert perspective on coverage gaps before they surface in production use.

← Back to summary Other AI Labs white papers →