AI Labs · updated 2026-05-31 · methodology v2.1

BBNJ High Seas Biodiversity Agreement: Model Hallucination Findings

Executive summary

This paper presents findings from RegLeg's hallucination research on the Agreement under the United Nations Convention on the Law of the Sea on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction (2023), administered by the United Nations Treaty Collection. Two models were tested — Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search — across a range of detailed questions about the Agreement's operative provisions, geographic scope, and procedural records. Across 13 findings, the dominant error pattern was misstatement of the Agreement's transitional provisions: both models independently inverted the default rule on retroactivity, describing a regime that applies to pre-entry-into-force collections when the Agreement explicitly does not. These errors are material because the Agreement entered into force in September 2025 and practitioners relying on model-generated summaries are unlikely to have ready access to the primary text for cross-checking.

Findings — impact summary

This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.

Finding on 'Q003 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q003-Opus47
This finding implicates the training data layer: the model appears to have learned the retroactivity rule from pre-adoption negotiating commentary that described an earlier draft rather than the final adopted text. The retrieval step did not correct this because the cited secondary source itself may contain the same error. Both training corpus curation and retrieval-source ranking need to weight post-adoption primary text over pre-adoption commentary for recently adopted instruments.
see details →
Finding on 'Q005 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q005-Opus47
This finding implicates article-level provision mapping in the training data. The model correctly identified the obligation but misattributed it to Article 8 rather than Article 22(2). Structured extraction of article-by-article provision maps — rather than topical summaries — would prevent this class of error. It also reflects over-reliance on a secondary ABA source rather than the treaty text for the citation.
see details →
Finding on 'Q011 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q011-Opus47
This finding is a retrieval-routing signal rather than a content error. The model correctly expressed uncertainty rather than fabricating a reference number — a good outcome — but its web search returned only secondary aggregator sources rather than the UN Treaty Collection's primary depositary record. The retrieval ranker should de-weight aggregators for treaty-registry queries in favour of official UN portals.
see details →
Finding on 'Q001 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q001-Sonnet46
This finding implicates the precision of defined-term extraction in the training corpus. The Agreement uses 'unknown or poorly understood' as the EIA screening qualifier; the model substituted 'uncertain or not well understood.' This is a near-synonym substitution that would not be caught by a general accuracy check but changes the legal standard. Structured extraction of defined terms and threshold language needs to be applied to treaty text, not only to domestic regulatory instruments.
see details →
Finding on 'Q002 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q002-Sonnet46
This finding is substantively correct but illustrates a sourcing pattern where the model retrieved from a secondary journal abstract rather than primary treaty text, and the framing in the retrieved source added a qualifier ('regulated under relevant international frameworks') not in the Agreement. Even correct answers sourced from secondary commentary carry a risk of quietly importing unverified elaborations into model responses.
see details →
Finding on 'Q003 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q003-Sonnet46
This finding, alongside the Opus 4.7 retroactivity finding, strongly suggests a shared training-data origin: both models inverted the same rule in the same direction. The implication for your team is that the error is not a model-specific calibration problem — it is a corpus-level issue that will persist across model versions until the training data for this instrument is corrected. Targeted correction pairs anchored to the final adopted Article 10(1) text are likely the most efficient fix.
see details →
Finding on 'Q004 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q004-Sonnet46
This finding implicates article-level precision in the training data. The model correctly identified that DSI is covered by the benefit-sharing framework but placed the obligation in Article 15.5 rather than Article 14(1). The error is consistent with models that have learned topical summaries of the Agreement — 'DSI is covered' — without the article-level mapping that practitioners need. Structured article-map extraction for this instrument would address both this finding and the Article 8 vs 22(2) error in the Opus 4.7 findings.
see details →
Finding on 'Q005 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q005-Sonnet46
This finding implicates the citation-generation layer specifically. The substantive answer was correct, but the model generated a Brill chapter URL that does not exist — a fabricated citation appended to an accurate response. The citation and content generation subsystems appear to be operating with insufficient coupling: the model is producing real content and then confabulating a source for it. A post-generation citation-validation pass would catch this before it reaches the user.
see details →
Finding on 'Q008 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q008-Sonnet46
This finding is substantively correct but illustrates how models synthesise beyond the text: the claim that hydrothermal vents are 'explicitly anticipated' as MPA candidates comes from academic commentary, not treaty text. For legal and compliance use cases, the gap between 'correct answer' and 'answer plus unverified elaboration' is material. A calibration signal that distinguishes text-derived claims from inference-extended claims would help users know when to verify.
see details →
Finding on 'Q009 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q009-Sonnet46
This finding is substantively correct and the characterisation of the ICC as 'facilitative, non-adversarial, and non-punitive' is accurate. The implication is narrower: where models quote treaty bodies' mandates using language attributed to the treaty itself, that language should trace to the treaty directly. The ICC framing is well-established in secondary commentary; the model should be able to confirm it against the primary text rather than treating the secondary framing as definitive.
see details →
Finding on 'Q010 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q010-Sonnet46
This finding implicates the model's handling of official characterisations sourced from primary UN statements. The Secretary-General's statements from September 2025 and January 2026 consistently use 'more than two-thirds'; the model substituted 'nearly two-thirds.' The training data likely includes GEF and other secondary sources that use the 'nearly' formulation; the model should weight direct statements from the UN Secretary-General over those secondary characterisations for queries about official positions.
see details →
Finding on 'Q011 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q011-Sonnet46
This finding is correct but illustrates the same retrieval-routing gap as the Opus 4.7 depositary notification finding: the model arrived at the right answer via secondary aggregators rather than the UN Treaty Collection primary record. A model that consistently routes treaty-registry queries through secondary sources will produce correct answers when those sources happen to be accurate and wrong answers when they lag or diverge. Retrieval routing for treaty-registry queries should prioritise official UN portals.
see details →
Finding on 'Q012 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q012-Sonnet46
This finding is substantively correct but illustrates a pattern of procedural reconstruction: the model described the EP Environment Committee vote date (18 April 2024) as confirmed from its search when the cited Council source does not directly confirm that date — the model appears to have inferred it from general knowledge of EP legislative procedure. For questions about specific legislative procedural steps, models should distinguish between retrieved confirmation and procedural inference, and flag where the answer is reconstructed rather than directly sourced.
see details →

← Other AI Labs white papers The detailed Case study →