OpenAI has today, 17 July 2025, initiated the launch of the ChatGPT Agent, a new capability that fundamentally transforms its popular chatbot from a passive assistant into an active, autonomous system. This agent is designed to manage complex, multi-step tasks on a user’s computer from start to finish, representing what the company calls its “boldest attempt yet” to create a truly agentic product.
The launch signals a significant strategic shift in the artificial intelligence landscape, moving beyond mere text generation to offload entire workflows from the user. This new functionality is rolling out immediately to subscribers of OpenAI’s Pro, Plus, and Team plans, with Enterprise and Education users slated to receive access in the coming weeks. However, the new agent’s powerful capabilities also introduce a new class of security and safety risks, prompting OpenAI to deploy its most robust biosecurity protocols to date.
The Agentic Leap: From Answering to Acting
The core difference in the ChatGPT Agent is its ability to perform actions, not just provide answers. Users can now direct the agent using natural language to handle complex requests that previously required significant human effort. The company provides examples such as, “analyse three competitors and create a slide deck,” or, “plan a Japanese-style breakfast for four and buy the necessary ingredients.”
To accomplish this, the agent intelligently navigates web pages, filters results, applies filters, and can even prompt the user to log in securely when authentication is required. It can execute code, perform analysis, and generate polished, editable deliverables like presentations and spreadsheets.
Crucially, the agent performs these tasks using its own virtual computer, allowing it to rapidly switch between reasoning and execution. This integrated system maintains the necessary context for each task, even when using multiple tools, enabling it to manage intricate workflows autonomously while following the user’s instructions.
Under the Bonnet: A Unified System of Tools
This new capability is described as a natural evolution and integration of two previously distinct OpenAI tools: Operator and In-Depth Research. Until now, Operator excelled at web interaction—scrolling, clicking, and typing—whilst In-Depth Research specialised in analysing and synthesising information. Neither, however, could perform the other’s function.
By integrating these complementary capabilities, OpenAI has unlocked entirely new features within a single model. The ChatGPT Agent is equipped with a comprehensive set of tools, including:
- A visual browser to interact with graphical web interfaces.
- A text-based browser for simple, reasoning-based web queries.
- A terminal for code execution and file manipulation.
- Direct API access and ChatGPT connectors.
These connectors allow the agent to link with applications like Gmail or GitHub, finding relevant information and integrating it into its responses. This multi-tool approach allows the model to choose the optimal path for a task; for example, accessing a calendar via an API whilst simultaneously processing large text files with the text-based browser.
Furthermore, the agent is designed for collaborative and iterative workflows. A user can interrupt the agent at any time to clarify instructions, redirect the task, or change it entirely. The agent can pick up where it left off without losing progress and can also proactively request more details to ensure its actions remain aligned with the user’s goals.
A New Benchmark for Performance: The Agent vs. The World
OpenAI substantiates its claims of the agent’s advanced capabilities with a suite of new state-of-the-art (SOTA) benchmark scores, demonstrating a significant leap in performance across diverse and complex domains.
In Humanity’s Last Exam (HLE), an assessment measuring expert-level knowledge across a wide range of subjects, the agent achieved a new SOTA pass@1 score of 41.6%. This score, which roughly doubles that of previous OpenAI models, rises to 44.4% when using a simple parallel execution strategy.
The agent’s mathematical reasoning also shows profound improvement. On FrontierMath, the most demanding mathematical benchmark to date, the agent achieves an accuracy of 27.4% when using its tools—a dramatic increase over the 19.3% scored by o4-mini.
Perhaps most telling are the benchmarks for real-world professional tasks. In tests designed to evaluate performance on complex workplace knowledge, the ChatGPT Agent delivered results comparable to or better than humans in approximately half of the cases. In DSBench, a set of tests for realistic data science tasks, the agent far surpasses human performance, scoring 89.9% on data analysis (vs. 64.1% for humans) and 85.5% on data modelling (vs. 65.0% for humans).
This professional-grade performance extends to financial and office software. On SpreadsheetBench, the agent achieves a score of 45.5% when granted direct .xlsx access, more than doubling the 20% score of Copilot in Excel. In an internal test of investment banking tasks, such as building a leveraged buyout model, the agent achieved an accuracy of 71.3%, significantly outperforming the 55.9% from In-Depth Research.
Unprecedented Capability, Unprecedented Risk
This expansion in capability, however, introduces a new and serious class of risks. As the agent can perform actions on the web and directly access user data—both from connectors and logged-in websites—it creates a novel attack surface.
OpenAI places particular emphasis on the danger of malicious manipulation through prompt injection. This is a scenario where a third party hides malicious instructions within a webpage (e.g., in invisible text or metadata). An unsuspecting agent, encountering this data while performing a task, could be tricked into performing unwanted actions, such as exfiltrating private data from a connected app or taking harmful actions on a page where the user is logged in. Because the agent can operate autonomously, a successful attack could have “serious consequences.”
The most significant safety declaration, however, relates to biosecurity. Given the model’s enhanced capabilities, OpenAI has chosen to precautionarily treat the ChatGPT agent as a high-capability tool in the fields of biology and chemistry.
Whilst the company states it lacks “conclusive evidence” that the model could help an untrained individual cause serious biological harm, it is activating its “most comprehensive security measures to date” out of caution. This includes training the model to reject dual-use requests, implementing permanently active classifiers to monitor for biology-related content, and establishing well-defined control protocols.
Mitigation, Control, and the ‘Observer Mode’
In response to these substantial risks, OpenAI has implemented several layers of user control and safety mitigations. The agent is trained to explicitly request user permission before taking critical actions, such as making a purchase.
For certain fundamental tasks, such as sending emails, the agent requires active user supervision in an “Observer mode.” The system has also been trained to proactively refuse high-risk tasks, such as making bank transfers.
On the privacy front, the agent features a “secure browser control mode.” When a user takes direct control of the browser to log in, ChatGPT does not collect or store any data entered, such as passwords. Users can also clear all browsing data and log out of all active websites with a single click in the settings.
The New Agentic Arms Race and Current Limitations
OpenAI’s launch does not occur in a vacuum. It is the most significant move yet in what has become the “most-hyped trend in AI,” with industry executives at Google, Meta, and Amazon all pursuing the goal of an “agentic” product. The ideal, often compared to “Iron Man’s J.A.R.V.I.S.,” is an AI that can perform complex job functions and automate digital life.
The trend has already shown commercial viability. Fintech company Klarna, for example, announced its own AI agent now handles two-thirds of its customer service chats—the equivalent workload of 700 full-time human employees. Competitors like Anthropic have also demonstrated similar “Computer Use” tools.
Despite the impressive benchmarks, OpenAI is clear that the ChatGPT Agent is still in an early stage and “can still make mistakes.” The new feature for generating presentations, for example, is currently in beta, with results described as “somewhat basic” in termss of formatting. The company notes there are still “some discrepancies” between the slides displayed in the agent’s viewer and the final exported PowerPoint file, an issue it is working to resolve.

Conclusion: The Dawn of the Digital Colleague
Ultimately, the arrival of the ChatGPT Agent marks a definitive inflection point. It is a technically impressive system that, according to extensive benchmarking, can outperform expert humans in a growing number of complex, professional domains. It fires the starting pistol on a new “agentic arms race” in Silicon Valley, moving the entire industry’s focus from passive information retrieval to active task execution.
The launch is balanced, however, by a sober acknowledgement of profound new risks, particularly the potential for malicious prompt injection and the precautionary “high capability” designation for biosecurity. The success of this new paradigm will hinge not only on its raw power but on the robustness of these new safety mitigations.
This marks the end of the AI as a simple tool. In my view, we are now entering the era of the digital colleague, an entity whose value will be measured not in the quality of its answers, but in the real-world impact of its actions. The central question is no longer ‘What does it know?’ but ‘What can it do?’—and, perhaps more importantly, ‘What should we allow it to do?’
FAQ
It is an autonomous AI system within ChatGPT that can perform complex, multi-step tasks on your computer, such as analysing data, creating presentations, or planning and purchasing items.
The regular ChatGPT primarily answers questions. The ChatGPT Agent takes actions. It can navigate websites, use tools, and manage entire workflows from start to finish.
It can handle tasks like comparing competitors and creating a slide deck, planning a meal and ordering the ingredients, or checking your calendar to summarise upcoming meetings.
It combines the web-interaction skills of “Operator” with the analytical skills of “In-Depth Research.” It runs on its own “virtual computer” and uses a set of tools including browsers, a terminal, and API connectors.
The agent must ask for user permission before critical actions (like purchases), uses an “Observer mode” for tasks like sending email, and will refuse high-risk requests like bank transfers.