The One AI Writing Hack Nobody Talks About

Deep Dive

📺 Nate B Jones ⏱ ~21:50 🔗 Watch on YouTube

Overview

Nate B Jones argues that the biggest source of AI hallucinations in 2026 isn't the model — it's the messy working environment around it. Using the Sullivan & Cromwell legal scandal as a case study, he presents a 'data room' workflow: prepare your sources before asking the agent to produce anything.

With Opus 4.7 and GPT 5.5 now capable of long-running file manipulation, building a clean workspace first is the structural fix for hallucinations.

1

The Sullivan & Cromwell Scandal

0:00

Sullivan & Cromwell, one of the most prestigious law firms, had to write an apology to a federal bankruptcy judge. Their emergency motion contained dozens of fabricated or misquoted citations — AI hallucinations. The partner who signed the apology co-heads the firm's restructuring practice.

This isn't a 2024-style hallucination from a solo practitioner using ChatGPT. This is an organizational failure at the top of an AI-assisted workflow. The motion looked legitimate — correct structure, professional formatting — but dozens of citations pointed at the wrong things. Nobody on the team caught it before filing.

'The model is not the problem. The working environment around the model is the problem.'
2

Why Better Prompts Don't Fix This

1:04

Addresses the Marc Andreessen screenshot circulating online ("just tell the model not to hallucinate"). Nate's response:

'You cannot tell a language model not to hallucinate any more than you can tell autocomplete not to autocomplete.'

There is no separate truth-check pass inside the model that an instruction can hook into. Sullivan & Cromwell had access to the best AI tooling money can buy. The wrong detail still made it into court. The fix is not a sharper prompt.

3

What's New with Opus 4.7 and GPT 5.5

1:59

These models do long-running agentic tasks on your file system. They walk folder trees, open files, compare dates across documents, inspect metadata. This capability has flipped the hallucination workflow, but most people haven't caught it yet.

Nate's personal experience: with Codex, he drafted 8 documents simultaneously — only possible because he prepared the data room first.

'It felt like the hair was blowing back on my face and I was living in the future.'
4

Your First Prompt Is Never 'Do The Thing'

4:54

Because of ChatGPT (2022), most people think AI workflow starts with doing a job: write the memo, make the spreadsheet. But serious projects almost never have organized source material.

The reality: strategy docs, meeting transcripts, spreadsheets, half-finished notes, follow-up emails, old decks, forgotten PDFs, Slack threads where actual decisions were made. Some current, some stale, some contradictory.

When you ask AI to write from this mess, you're asking two jobs at once: (1) figure out what this is, and (2) produce a beautiful artifact. That's a recipe for mediocre results and hallucinations.

The correct first instruction:
'Find the relevant materials. Preserve the originals. Build me a data inventory. Tell me which files are authoritative, duplicates, old, or missing. Summarize every source before you synthesize anything. Do NOT write the deliverable yet.'
5

The Project Room / Data Room

7:22

A project room is a bounded workspace for one serious job — a project, a deliverable, a source set. Smaller than a second brain, more specific than a knowledge management system.

Key distinction: local files over cloud solutions. Nate finds local file systems more flexible than Claude Projects, ChatGPT Projects, etc. — no file type limitations, and LLMs are trained to work with computers at their most primitive level.

Examples

6

The Source Inventory — Most Important Artifact

10:30

The first thing to ask the agent to produce. A table recording for every file: path, type, date, apparent authority, whether current or superseded, what claims it supports, limitations, and how it should be used.

Why it matters: it tells you what the agent thinks the project consists of. You get a chance to correct the working set before the final draft inherits mistakes. It also makes verification by another LLM much easier.

7

The Conflict Log

12:46

When the agent reads a serious source set, it finds disagreements: old PDF vs current plan, transcript using different names for stakeholders, spreadsheet numbers with no visible assumptions, documents that look adjacent but are months apart.

❌ Weak workflow: agent synthesizes and smooths conflicts over. Output reads confidently but you don't know what to trust.

✅ Strong workflow: agent surfaces disagreements in a conflict log with recommended responses. You review and decide before building the final document.

8

The Missing Context List

13:50

One of the best signs an agent is helping properly: it tells you what it doesn't have. Missing decisions, numbers with no source, absent data files referenced in only one document.

The missing material is often more important than what you have. Without this step, gaps become 'hallucination traps' — the model invents around them, the prose looks fine, and you ship something with a soft spot underneath.
9

Duplicates — A Reasoning Problem

15:07

Duplicate detection isn't housekeeping — it's a reasoning problem. Three versions of a plan: agent might blend them. Same transcript exported twice: overweighted in synthesis. Old and new deck with similar titles: source for wrong claims.

Rule: Do NOT let the agent delete duplicates. Let it produce a duplicates report with confidence levels and version families. 'The agent finds, you decide.'
10

The Prompt Gets Short

18:10

Once the room is prepared (inventory, conflict log, missing context list, duplicates report), the writing prompt becomes very short:

'Use the reviewed source inventory. Treat the current operating plan as authoritative for numbers, the transcript as source material for decision context, the older deck as background only. Draft the memo, cite claims, flag anything not supported.'

This makes the AI's work inspectable. It's the difference between using AI as a colleague and using AI as a gopher.

11

Calibration and Closing

19:48

This workflow is specifically for serious knowledge work — 30–50 hour Codex runs, heavy reports, complex projects. NOT for casual AI interactions (overkill). NOT for back-office agentic pipelines (different problem).

Mental model shift: The old AI question was 'can the model do the thing?' The new question is 'can the agent help prepare the conditions under which good work happens?'

'An agent can walk into a messy room, turn on the lights, label everything, and get the desk organized for serious work. That is an AI worth using.'

🔑 Key Takeaways

12026 hallucinations are structural, not prompting failures — the working environment is the problem
2You cannot prompt away hallucinations — "tell it not to hallucinate" doesn't work
3Your first AI prompt should never be "do the thing" — it should be "build the room"
4The Project Room / Data Room: a bounded workspace for one serious job
5Source Inventory is the most important artifact — it makes the agent's judgment visible
6Conflict Log surfaces disagreements before they become hallucination traps
7Missing Context List reveals gaps the model would otherwise invent around
8Duplicates are a reasoning problem, not housekeeping — "agent finds, you decide"
9Once the room is prepared, the writing prompt gets very short and the output gets much better
10Opus 4.7 and GPT 5.5 are specifically capable of the long-running file manipulation this requires

⏱ Timestamp Index

0:00Sullivan & Cromwell scandal
1:04Why better prompts don't fix hallucinations
1:59What's new with Opus 4.7 & GPT 5.5
3:35Three key takeaways overview
4:54Your first prompt is never 'do the thing'
6:27The correct first instruction
7:22The Project Room / Data Room concept
8:28Local files vs cloud solutions
9:12Project room examples
10:30The Source Inventory artifact
12:23Why this matters now
12:46The Conflict Log
13:50The Missing Context List
15:07Duplicates as a reasoning problem
16:14Why this matters with current agents
18:10The prompt gets short after preparation
19:48Calibration: when to use this workflow
20:43Mental model shift: old vs new question
Generated deep dive — Watch the original video