A 1.5-million-token context window is a design problem, not a feature

OpenAI's GPT-5.6, expected late June 2026, reportedly extends the context window to 1.5M tokens and trims token usage another 10–15% over GPT-5.5, with the emphasis squarely on agentic workflows. The headline writes itself: "you can just put everything in the prompt now."

You can't. Or rather — you can, and you'll regret the bill, the latency, and the quietly degraded answers.

What a bigger window actually changes

A larger window doesn't repeal the rules; it moves them:

Relevance still beats volume. A model handed 1.5M tokens of mostly-irrelevant context answers worse, not better, than one handed the right 8K. "Lost in the middle" doesn't go away because the middle got longer.
The cost is real even when it's cheaper. A 10–15% per-token saving against a window you've made 10× larger is not a saving. Efficiency gains get eaten by capacity the moment you stop rationing.
Retrieval becomes curation. The job shifts from "fit it in the window" to "decide what deserves to be there." That's still retrieval. It's still my problem.

How I build around it

The bigger window earns its keep in specific places — a long agent trajectory that needs its own history, a whole file under review, a multi-document synthesis where chunking was lossy. For those, it's genuinely better. For everything else, I still retrieve narrowly and pass the minimum.

The trap with every capability jump is treating it as permission to stop designing. A 1.5M-token window is a sharper tool, not a license to skip the thinking. I'll reach for it where the task is genuinely long-horizon, and keep my context lean everywhere else — because the model that reads everything also pays for everything, and so do I.

Sources: Introducing GPT-5.5 (OpenAI), OpenAI plans June GPT-5.6 (AI Weekly).