OpenAI Launches ChatGPT Agent, a Combo of Operator and Deep Research

Hayden Field, reporting for The Verge:

The company on Thursday debuted ChatGPT Agent, which it bills as a tool that can complete work on your behalf using its own “virtual computer.”

In a briefing and demo with The Verge, Yash Kumar and Isa Fulford — product lead and research lead on ChatGPT Agent, respectively — said it’s powered by a new model that OpenAI developed specifically for the product. The company said the new tool can perform tasks like looking at a user’s calendar to brief them on upcoming client meetings, planning and purchasing ingredients to make a family breakfast, and creating a slide deck based on its analysis of competing companies.

The model behind ChatGPT Agent, which has no specific name, was trained on complex tasks that require multiple tools — like a text browser, visual browser, and terminal where users can import their own data — via reinforcement learning, the same technique used for all of OpenAI’s reasoning models. OpenAI said that ChatGPT Agent combines the capabilities of both Operator and Deep Research, two of its existing AI tools.

To develop the new tool, the company combined the teams behind both Operator and Deep Research into one unified team.

In all honesty, I haven’t tried it yet — OpenAI seems to be doing a slow rollout to Plus subscribers through Thursday — but it seems pretty close to Operator, powered by a new model more competitive with OpenAI’s text-based reasoning models. Operator was announced in January, and when I wrote about it, I said how it wasn’t the future of artificial intelligence because it involved looking at graphical user interfaces inherently designed for human use. I still stand by Agent and Operator being bridges between the human-centric internet and the (presumably coming) AI-focused internet in a time where humans are, to an extent, suffering from the fast pace of large language model-powered chatbots. (Publishers get fewer clicks, the internet is filled with AI-generated slop, etc. — these are short-term harms created by AI.)

Agent, by OpenAI’s own admission, is slow, and it asks for permission to do its job because OpenAI’s confidence level in the model is so low. Theoretically, people asking Agent to do something should be the permission — there should be no need for another confirmation prompt. But, alas, Agent is a computer living in a human-centric internet, and no matter how good OpenAI is at making models, there’s always a possibility the model makes an irreversible mistake. OpenAI is giving a computer access to its own computer, and that exposes inherent vulnerabilities in AI as it stands today. The goal of most AI companies these days is to develop “agents” (lowercase-A) that go out and do some work on the internet. Google’s strategy, for example, is to use its vast swath of application programming interface access it has earned through delicate partnerships with dozens of independent companies on the web reliant on Google for traffic. Apple’s is to leverage the relationship it has with developers to build App Intents.

OpenAI has none of those relationships. It briefly tried to make “apps” happen in ChatGPT, through third-party “GPTs,” but that never went anywhere. It could try to make deals with companies for API access, but I think its engineers surmised that the best way (for them) to conquer the problem is to put their all into the technology. To me, there are two ways of dealing with the AI problem: try to play nice with everyone (Apple, Google), or try to build the tech to do it yourself (OpenAI, Perplexity). OpenAI doesn’t want to be dependent on any other company on the web for its core product’s functionality. The only exception I can think of is Codex, which requires a GitHub account to push code commits, but that’s just a great example of why Agent is destined to fail. Codex is a perfect agentic AI because it integrates with a product people use and love, and it integrates well. Agent, by comparison, integrates poorly because the lone-wolf “build it yourself” strategy seldom works.

The solution to Agent’s pitfalls is obvious: APIs. Google’s Project Mariner uses them, Apple’s yet-to-come “more personalized Siri” should use them, and Anthropic’s Model Context Protocol aims to create a marketplace of tools for AI models to integrate with. MCP is an API of APIs built for chatbots and other LLM-based tools, and I think it’s the best solution to this issue. That’s why every AI company (Google, OpenAI, etc.) announced support for it — because they know APIs are the inevitable answer. If every website on the internet had MCP integration, chatbots and AI agents wouldn’t have to go through the human-centric internet. Computers talk to each other via APIs, not websites, and Agent ignores the segmentation built into the internet decades ago. That’s why it’s so bad — it’s a computer that’s trying not to be a computer. It’s great for demonstrations, but terrible for any actual work.