Codex Is Moving From Code Generation to Delegated Knowledge Work

Thibault SottiauxOpenAIThursday, May 14, 202614 min read

Codex is moving from a coding assistant toward an agent for delegated knowledge work, according to Thibault Sottiaux, OpenAI’s head of Codex. In an OpenAI Forum conversation with Chris Nicholson of OpenAI Global Affairs, Sottiaux argues that as models have become more reliable and better connected to workplace context, Codex is being used to research, organize information, create files and presentations, coordinate across tools, and run background tasks. That shift, he says, makes delegation, trust and access controls central as agents act across files, communications tools and company systems.

Codex is becoming a system for delegated work, not just code generation

Thibault Sottiaux described Codex as beginning with a narrower challenge: helping software engineers by inspecting a code repository, inferring the changes needed for a task, and opening a GitHub pull request. The first public version, now referred to as Codex Web, ran in the cloud behind a web interface. A user stated an intent, and Codex returned a proposed code change.

That design exposed two constraints. Developers already had customized local environments, and reproducing those environments inside OpenAI’s cloud made setup too difficult. The models also were not yet reliable enough for long-horizon tasks, which made iteration harder when Codex missed the mark. Sottiaux said the team concluded that asking users to configure a remote agent was “just too difficult” before the models could consistently complete longer tasks.

The shift beyond software engineering came later. Sottiaux tied it to increased generality and reliability after GPT-5, and especially around GPT-5.2 for longer-horizon work. The important observation was that even software engineers do not spend most of their day writing code. They triage tickets, prioritize work, discuss architecture, investigate bug reports, determine whether a reported problem is real, respond to outages, stay on call, and gather context across systems.

I would say, like, maybe, you know, software engineers spend, like, you know, 20, 30% of their time actually coding.

Thibault Sottiaux · Source

Codex was being adopted by technical users not only to modify code, but to handle the surrounding knowledge work. As the agent gained access to more context — Notion, documents, internal information, and other work surfaces — it became more useful both for coding and for a wider class of non-coding tasks.

Now, like, the majority of the tasks that are being performed in Codex are actually non-coding tasks.

Thibault Sottiaux · Source

Chris Nicholson framed the change as a move from searching code to searching documents, organizing information, and returning usable outputs. Sottiaux’s stronger claim was that Codex is no longer best understood as a code generator. It still uses code, but code is becoming an implementation detail: a tool the agent uses to manipulate files, create spreadsheets, generate websites, build slide decks, and perform general-purpose tasks on a user’s behalf.

The internal proof point was coordination work that kept a launch moving

The moment Sottiaux said changed his view involved an internal Codex launch. He described Alexander using Codex to track the state of many changes that needed to land before release, with Maricruz, the lead product manager on Codex, also involved in the launch. Sottiaux recalled seeing Alexander in a meeting while “many little Codex agents” worked in the background: tracking user feedback, monitoring developer updates, chasing people for status, and keeping a launch plan current.

The delegated work was coordination: searching Slack, documents, GitHub pull requests, and feedback channels; turning scattered inputs into a plan; identifying what still needed polish; and asking people for updates. Sottiaux said he had “never seen someone be as productive” as Alexander in that moment.

Codex was connected to Slack, so the agent could send messages asking for the latest status on a specific item. Nicholson focused on the importance of that mundane function: chasing people consumes time. A model good at gathering context and summarizing becomes more useful when it can also act inside the communication systems where work is coordinated.

That use changed bottlenecks around the team. Engineers were “greatly accelerated” and building faster than before. Adjacent roles began changing too: designers and product managers gained more ability to act directly rather than waiting for engineering prioritization. As product and engineering output increased, bottlenecks shifted toward communications and marketing, where the company had to explain and coordinate a larger volume of work while keeping the story coherent.

Asked whether OpenAI could ship at its current pace without Codex, Sottiaux answered directly.

At this point, Codex is critical for us.

Thibault Sottiaux · Source

He added that the alternative might be “10 times more engineers.” The claim was not limited to OpenAI. Sottiaux argued that the technology has reached a point where agents can handle very general work: preparing presentations, gathering context on public perception, doing marketing research, organizing information, and supporting finance work. He said Sarah Friar talks about organizing OpenAI’s latest fundraise with the help of Codex, and described Codex as “a great accelerant” in that context.

Personal software lowers the cost of solving small, specific problems

Nicholson described the older pattern of work as a separation between people who had problems and people who could build solutions. The people with the problem had to explain their needs to another group, wait for prioritization, and accept something that was only “kind of okay” because nobody had enough time to keep iterating.

Sottiaux said he sees that separation narrowing. Designers on the Codex team are pushing code and making product changes directly, working alongside developers to refine the experience. In his telling, changes that may appear minor to an engineering team can be central to a designer’s craft: the details that elevate the experience for users. Codex lets designers make more of those changes themselves instead of first convincing an engineering team to prioritize them.

Nicholson called the pattern “home-cooked, personalized software.” Sottiaux agreed that this is the wave he sees coming: people able to create and maintain personal software that does exactly what they need.

The demonstration was deliberately ordinary. Sottiaux, who lives in San Francisco, said the price of bread in the city is “outrageous” after moving from Europe. He asked Codex to find the best loaves in San Francisco and create a spreadsheet showing where to buy them and their prices. Codex worked for about five minutes and produced a spreadsheet with bakeries, descriptions, locations, and prices.

He then asked Codex to turn the spreadsheet into a web page. The interface shown on screen displayed a “San Francisco Bread Map”: an interactive map with pins for 10 bakeries, a ranked bakery list, filters, selected-place details, directions and source links, and a link back to the spreadsheet. Another view showed the underlying table with San Francisco bakeries and neighborhoods, including Arsicault Bakery in Inner Richmond, Tartine Bakery in the Mission District, Neighbor Bakehouse in Dogpatch, and Jane the Bakery in Western Addition.

4 minutes

Sottiaux said Codex took to generate the San Francisco bread map webpage

The next iteration required only a short instruction. Sottiaux typed, “Hey, do the same analysis for coffee, not bread.” He said Codex would work for about eight minutes and likely produce a similar coffee website. Nicholson noted that the request was not simply “give me cheap bread”; Sottiaux was expressing preferences around price and quality, and Codex was taking those preferences into account.

Sottiaux emphasized that he did not need to understand the code underneath. Codex was using code, spreadsheets, and a website as implementation details. The user interface was a spoken or typed request. He said he often engages with Codex by voice: he can ask for a map of loaves in San Francisco, with prices and explanations, and then do something else while the agent works.

Nicholson described the loop as one common to much of life and work: gather data, structure or visualize it, reach an insight, and make a decision according to personal goals. Sottiaux added that the same general pattern can apply to small personal decisions and complicated institutional work.

The biggest gain may be work that previously never cleared the threshold

Sottiaux said he uses Codex constantly, handing off perhaps more than 100 tasks a day. The range includes organizing desktop files, managing compute fleet information, understanding on-call rotations, reviewing how engineers are doing, checking launch schedules, and flagging risks that need his attention.

One recurring automation functions like a personal chief of staff. Sottiaux showed a Codex automation prompt: “Go through @Gmail @Notion @Google Calendar and give me a summary of my day and flag anything that is at risk,” scheduled daily at 9:00 AM. The interface described automations as scheduled chats and showed templates for status reports, incidents, and triage, including summarizing git activity for standup, synthesizing PRs and rollouts into a weekly update, and grouping CI failures by likely root cause.

The value is not only that tasks take less time. It is that many tasks previously never cleared the threshold for action. Sottiaux said the most annoying pre-Codex pattern was needing information but not wanting to “annoy someone” by asking for research on a question that might not be important enough. Now he can request the report, build the small personal tool, or investigate the low-priority issue through Codex.

Nicholson argued that some “weeks to seconds” comparisons understate the change. For many tasks, the old timeline was not weeks; it was “infinity time,” because the work would never have happened. Sottiaux agreed that “there’s a lot of that.”

This changes the emotional texture of work as well. Sottiaux said he enjoys his job more because small tedious tasks are removed from his attention and because he feels fewer things are falling through the cracks. He gave the example of seeing a user bug report on Twitter that might previously have been deprioritized because it seemed to affect only a few users. Now he can put it into Codex and let the agent investigate. That reduces cognitive load.

Nicholson connected this to burnout and information overload: workers are often surrounded by tools meant to help them, yet feel trapped inside those tools through manual entry, searching, and coordination. Sottiaux described the promise as a “trustworthy partner” that can do work on a user’s behalf, flag when it cannot meet the desired standard, shield the user from noise, and surface important information in time.

His imagined endpoint is not reading email more efficiently, but not reading most email at all. A personal agent would read the inbox, flag only what matters, ask for input when needed, and otherwise do the work. Nicholson put it as avoiding the search for “needles in a dozen different haystacks.” Sottiaux’s version was goal-based: “Here are my goals for today, help me out, manage everything else.”

Delegation works better when the agent can judge whether it succeeded

Sottiaux described a newer advanced feature, /goal, as a way to move Codex from discrete task execution toward persistent pursuit of an objective. In Codex, slash commands allow users to enter different modes. /goal lets a user give Codex a long-term goal and have it work “relentless[ly]” until it decides the goal is satisfied.

The examples were at the frontier of difficulty: solving a hard mathematical problem, improving program performance, rewriting entire programs from one language to another, and working on science problems. Sottiaux said he had seen “really cool breakthroughs” in mathematics and physics with this kind of use. The notable change is the time horizon. A few months earlier, he said, the team was excited when agents could work for 10 minutes. Now they are talking about agents working for weeks on hard tasks.

Nicholson asked whether /goal could be used in the Codex app shown during the session. Sottiaux said it was coming but not yet launched there. The described behavior is that the agent runs in the background for days and eventually reports that it is done or stuck.

Sottiaux then pointed beyond even goal-oriented task solving. Today, he said, the interaction remains largely turn-based: a user provides a task, the agent performs it, and the user responds. The future he expects is an agent that runs continuously, 24/7, doing useful things and being steered along the way. It might at times decide it has done all useful work and “sleep” until the user engages again, but it would not operate only when given a clear instruction.

That raises a practical management problem. Successful use depends heavily on helping the agent evaluate its own success. Users should describe what “good” looks like, what “solved” looks like, and what they want to see at the end of a task. For a slide deck, for example, a user might specify 10 slides: the first two with a certain type of context, the next six with a technical breakdown, and the final two with open questions and Q&A. The more specific the desired output, the more likely Codex is to succeed.

Nicholson compared this to managing an assistant or lieutenant. Sottiaux agreed. When asked what separates non-developers who get results from those who feel stuck, he returned to the same point: treat the agent like someone new who has joined the company and lacks context. Give it information about where to find things, what matters, which documents are relevant, and what outcome is expected.

Sottiaux said he has adopted the habit of writing things down — his thoughts, goals, and preferences — in files on his computer that Codex can access. Since the agent cannot read his mind, he said, the user has to verbalize some of that context.

Codex is for doing work across tools; ChatGPT remains useful for answers

A community question asked what would make non-coders move from ChatGPT to Codex. Sottiaux did not frame Codex as a replacement for ChatGPT. He described it as complementary. ChatGPT remains his go-to for quick answers. Codex is for doing things.

The distinction is operational. Codex can manipulate files on a computer, run automations, work in the background every few hours, and act across connected tools. Nicholson described the older pattern of copying code from ChatGPT into a file or terminal and said “the copy and paste era is over.” If a user has a file or pictures on their computer, Sottiaux said, they can tell Codex to read and use them without manually opening, copying, or transferring content.

For non-developers, his advice was to connect more sources. Codex has more than 100 plugins, he said, including calendar, Docs, Notion, and other tools. The more access it has to a user’s information and tools, the more useful it becomes. Nicholson summarized the principle as more and better context producing more and better results, while noting that some of the needed context remains in the user’s head: goals, preferences, and experiences not captured in documents.

Sottiaux also gave examples of computer-use applications outside traditional coding. He uses Codex for shopping: meal planning on his personal computer, then having it order the ingredients. He has seen people use it to navigate operating-system settings, where finding the right Windows or macOS panel can be difficult. A user can ask Codex to show how to change something, and it can click through the relevant settings. He also mentioned adjusting slides, integrating images, and, for technical users, quality assurance: opening an app, clicking around, and testing whether it functions.

One caution emerged near the end. Sottiaux said he sees a mistake in “too much delegation” and not enough use of Codex to understand things. Agents can gather information, explain concepts, draw diagrams, and use image generation to render text-heavy explanatory visuals. He sees people use Codex to read launch plans, marketing materials, or parts of a codebase and then create images or diagrams that help them learn. Nicholson agreed, saying, “Whoever does the work does the learning,” and described building himself a tutor skill that asks him questions about what it has tried to teach, forcing active recall.

Enterprise adoption turns on trust, not only capability

Asked what the biggest bottleneck is for enterprise adoption — model capability, human trust, or organizational process design — Sottiaux said he does not think capability is the main constraint.

I don't think it's a capability thing. The capability is there that's not what's holding back adoption in enterprise. I think it's primarily trust and for trust it's about is it safe and is it secure?

Thibault Sottiaux · Source

The concern is straightforward: an agent running around a company could delete sensitive files, upload information somewhere it should not, or send an email that exfiltrates confidential information. If people believe that is possible, they will not use it.

Sottiaux said OpenAI has thought heavily about this. By default, agents run inside a sandbox with tight controls. They can be limited to a particular portion of the file system, constrained to a folder, and denied network access. Enterprise controls can restrict what information an agent can access and what actions it can take. Nicholson translated this for non-engineers as analogous to giving a person access only to one silo of organizational information. Sottiaux agreed: a sandbox is a way to restrict the agent’s access and allowed actions.

Nicholson also gave the example of read-only access: an agent can read data authorized by IT, but cannot write back, delete it, or modify it. Sottiaux confirmed that read-only access is one possible control.

Sottiaux also pointed to a feature he said OpenAI calls Auto Review on its alignment blog. In his description, a second agent reviews the actions of the primary Codex agent. The primary agent is trying to complete the work for the user, and at times it might take an action that is “a little bit risky.” The reviewing agent monitors those actions, flags high-risk behavior, and stops it. Nicholson compared it to a referee or umpire saying an action is against the rules. Sottiaux said he expects more innovations of that kind will be needed.

The trust issue sits in tension with the broader vision. Sottiaux’s future agent is more autonomous, more persistent, more deeply connected to information, and more capable of acting in the world. The enterprise adoption problem is that each of those strengths also increases the need for boundaries, monitoring, and organizational confidence.

The practical lesson is to pick one bounded workflow

Sottiaux’s broadest claim was that Codex is becoming “a very general and very powerful agent” which, when connected to the right sources of information and given permitted ways to act, can do much of the work a user allows it to do. The promise, in his framing, is value creation: work that was too time-consuming, too tedious, or too hard to justify becomes possible.

Nicholson said he has heard a similar pattern from “prize-winning physicists” and industry project managers: people had stacks of good ideas and are now beginning work on them because the cost of trying has fallen. Sottiaux said OpenAI’s intention is to distribute these capabilities broadly, giving more people access to agents that raise the level of what they can “dream about accomplishing.”

The near-term instruction was more concrete: try one workflow. Nicholson named research briefs, plans, onboarding processes, reports, and decision memos as candidates. The question for non-coders is not whether Codex has “code” in the name, but whether they work with information, search for context, make sense of data, produce documents, coordinate with others, or need complex tasks handled in the background.

Agents and Autonomy Coding Assistants AI Application Architecture Enterprise AI Adoption AI in Operations AI Security Human-AI Interaction

Codex is becoming a system for delegated work, not just code generation

The internal proof point was coordination work that kept a launch moving

Personal software lowers the cost of solving small, specific problems

The biggest gain may be work that previously never cleared the threshold

Delegation works better when the agent can judge whether it succeeded

Codex is for doing work across tools; ChatGPT remains useful for answers

Enterprise adoption turns on trust, not only capability

The practical lesson is to pick one bounded workflow

The frontier, in your inbox tomorrow at 08:00.