Nate B Jones argues that the biggest source of AI hallucinations in 2026 isn't the model — it's the messy working environment around it. Using the Sullivan & Cromwell legal scandal as a case study, he presents a 'data room' workflow: prepare your sources before asking the agent to produce anything.
With Opus 4.7 and GPT 5.5 now capable of long-running file manipulation, building a clean workspace first is the structural fix for hallucinations.
Sullivan & Cromwell, one of the most prestigious law firms, had to write an apology to a federal bankruptcy judge. Their emergency motion contained dozens of fabricated or misquoted citations — AI hallucinations. The partner who signed the apology co-heads the firm's restructuring practice.
This isn't a 2024-style hallucination from a solo practitioner using ChatGPT. This is an organizational failure at the top of an AI-assisted workflow. The motion looked legitimate — correct structure, professional formatting — but dozens of citations pointed at the wrong things. Nobody on the team caught it before filing.
Addresses the Marc Andreessen screenshot circulating online ("just tell the model not to hallucinate"). Nate's response:
There is no separate truth-check pass inside the model that an instruction can hook into. Sullivan & Cromwell had access to the best AI tooling money can buy. The wrong detail still made it into court. The fix is not a sharper prompt.
These models do long-running agentic tasks on your file system. They walk folder trees, open files, compare dates across documents, inspect metadata. This capability has flipped the hallucination workflow, but most people haven't caught it yet.
Nate's personal experience: with Codex, he drafted 8 documents simultaneously — only possible because he prepared the data room first.
Because of ChatGPT (2022), most people think AI workflow starts with doing a job: write the memo, make the spreadsheet. But serious projects almost never have organized source material.
The reality: strategy docs, meeting transcripts, spreadsheets, half-finished notes, follow-up emails, old decks, forgotten PDFs, Slack threads where actual decisions were made. Some current, some stale, some contradictory.
When you ask AI to write from this mess, you're asking two jobs at once: (1) figure out what this is, and (2) produce a beautiful artifact. That's a recipe for mediocre results and hallucinations.
A project room is a bounded workspace for one serious job — a project, a deliverable, a source set. Smaller than a second brain, more specific than a knowledge management system.
Key distinction: local files over cloud solutions. Nate finds local file systems more flexible than Claude Projects, ChatGPT Projects, etc. — no file type limitations, and LLMs are trained to work with computers at their most primitive level.
The first thing to ask the agent to produce. A table recording for every file: path, type, date, apparent authority, whether current or superseded, what claims it supports, limitations, and how it should be used.
Why it matters: it tells you what the agent thinks the project consists of. You get a chance to correct the working set before the final draft inherits mistakes. It also makes verification by another LLM much easier.
When the agent reads a serious source set, it finds disagreements: old PDF vs current plan, transcript using different names for stakeholders, spreadsheet numbers with no visible assumptions, documents that look adjacent but are months apart.
❌ Weak workflow: agent synthesizes and smooths conflicts over. Output reads confidently but you don't know what to trust.
✅ Strong workflow: agent surfaces disagreements in a conflict log with recommended responses. You review and decide before building the final document.
One of the best signs an agent is helping properly: it tells you what it doesn't have. Missing decisions, numbers with no source, absent data files referenced in only one document.
Duplicate detection isn't housekeeping — it's a reasoning problem. Three versions of a plan: agent might blend them. Same transcript exported twice: overweighted in synthesis. Old and new deck with similar titles: source for wrong claims.
Once the room is prepared (inventory, conflict log, missing context list, duplicates report), the writing prompt becomes very short:
This makes the AI's work inspectable. It's the difference between using AI as a colleague and using AI as a gopher.
This workflow is specifically for serious knowledge work — 30–50 hour Codex runs, heavy reports, complex projects. NOT for casual AI interactions (overkill). NOT for back-office agentic pipelines (different problem).
Mental model shift: The old AI question was 'can the model do the thing?' The new question is 'can the agent help prepare the conditions under which good work happens?'