AI’s Token Subsidization Problem

Berber Jin, reporting for The Wall Street Journal (Apple News+):

OpenAI recently missed its own targets for new users and revenue, stumbles that have raised concern among some company leaders about whether it will be able to support its massive spending on data centers.

Chief Financial Officer Sarah Friar has told other company leaders that she is worried the company might not be able to pay for future computing contracts if revenue doesn’t grow fast enough, according to people familiar with the matter.

OpenAI is burning through tokens and compute to compete with Anthropic, which has had a stronghold over the coding and business markets for many months now. It started with a complete revamp of Codex through the new GPT-5.3, GPT-5.4, and GPT-5.5 models, which aim to compete with Anthropic’s Claude Opus models. It launched the new Codex desktop app for macOS and Windows and has been aggressively adding features, including a best-in-class native computer use tool, a system-wide dictation hotkey, and hundreds of minor tweaks based on user feedback. The Codex team, to put it slangily, has been cooking.

But they’ve also been burning tokens. As the new GPT-5.5 model continues to give Claude Opus 4.7 a run for its money, more people are switching over to Codex due to support for third-party harnesses and tools like OpenClaw, great developer support and active communication, and frequent rate-limit resets. The $200 ChatGPT Pro plan, even if OpenAI employees don’t reset limits for fun midway through the week, provides practically unlimited inference, or usage of the models. Even the Plus plan, which I’ve been a subscriber to since the day GPT-4 launched three years ago, provides generous inference every five hours. And just when people think they’ll hit the usage cap, someone at OpenAI resets the week’s limits.

Anthropic has not done any of this — in fact, the company has drawn ire from developers for actively cracking down on token usage. Claude Opus 4.7’s new tokenizer — the part of inference that converts the prompt’s words to small chunks the model can understand — creates more tokens to encode the same text as previous versions. Anthropic has even tested removing access to Claude Code in the $20 Pro plan. It has also banned the use of third-party harnesses, most notably OpenClaw, the agent tool created by Peter Steinberger that went viral a few months ago. (OpenClaw was originally named Clawdbot and worked best with Claude models.)

Anthropic is doing this for one clear reason: They can’t afford to subsidize as many tokens anymore. The $20, $100, and $200 subscription plans for these artificial intelligence services are known as subsidized plans, giving subscribers more inference than they’re paying for. Cursor, the company that made the original AI-based Visual Studio Code fork, estimated that the $200 Claude Max plan offers roughly $5,000 of inference every month. That means that a person paying only $200 could use $5,000 worth of tokens billed at the standard rates: $5 per million input tokens and $25 per million output tokens. (These rates are billed by accessing Claude through the application programming interface.)

Anthropic (and OpenAI) can do this because not everyone who subscribes to the $200 plan uses $200 of inference every month. I would be surprised if most $20 subscribers even use $5 of inference every month. Most ChatGPT Plus subscribers don’t even know there is a model better than GPT-5.3 Instant, the base model, and they’re only subscribing for higher usage limits for that tier of service. This is advantageous to OpenAI because it pockets $15 from such a user. Heavy users, such as those who run Codex all day, also gain, as they can use thousands of dollars of tokens for only a few hundred dollars.

The problem with this is that it requires many people to subscribe to ChatGPT, when the vast majority don’t even know what’s included in the ChatGPT Plus plan. And rightfully so — $20 a month is an unfathomable subscription cost for many people. Talk about $200 a month. They’re not using advanced models, and they don’t use ChatGPT enough to hit the context window or token limit every five hours. But OpenAI’s employees, in a desperate attempt to steal users from Anthropic — which is already burning goodwill by messing with its subsidized tokens that once made the company famous — aren’t helping either by resetting the usage limits and offering generous multipliers on token usage every month.

The natural solution to this is to sign enterprise contracts to subsidize heavy Codex users, but that takes time and years of marketing. In the meantime, OpenAI needs money fast to build compute to run all of this free inference. Anthropic realized early that this would be a precarious position and immediately stopped people from burning tokens through third-party harnesses and tools. It invested early in signing enterprise contracts, right as Claude Opus 4.5 was taking off with hobbyists. And now it’s playing games to continue to turn a profit. The economics of subsidized tokens caught up to Anthropic, and it weaseled itself out of that situation (or is trying to). OpenAI has not, and now it’s sweating internally.

What we should make out of all of this is that subsidized tokens will soon be a thing of the past. $20 inference plans will be completely neutered, and everyone will be expected to pay hundreds of dollars for inference. The free tiers of the chat apps will be full of chumbox advertisements, like every other internet service. When Uber launched with a ton of venture capital funding, you could hail one for sometimes half of what a taxicab would cost. Nowadays, an Uber from Newark Liberty Airport to Manhattan costs over $100 (anecdotally). The era of subsidized Ubers passed, Uber turned a profit, and now we’re stuck with expensive transportation. The same thing will happen to AI tokens, and when it does, prepare for the bubble to finally burst.