Banks Can Use AI Agents to Turn Requirements Into Reviewed Features

Conor SpicerOpenAIMonday, June 8, 20267 min read

OpenAI solutions engineer Conor Spicer argues that financial institutions can use Codex to shorten the path from customer demand to production-ready digital features, not by replacing developers but by delegating larger units of software work to an AI agent. Using a fictional bank’s predictive-budgeting feature, he presents Codex as a system that can read approved requirements, modify code, run tests, prepare compliance evidence, draft legacy portal submissions, and review pull requests while leaving humans to inspect and approve the work.

Codex changes the unit of software work

Conor Spicer frames Codex not as an autocomplete tool, but as an AI coding agent that can be assigned a job, inspect a codebase, and run for hours — “perhaps even a full day” — until the work is done. The shift is not simply that developers type less. It is that software work can be delegated in larger units: looking across code, documentation, tooling, tests, review surfaces, and operational context.

Spicer says OpenAI launched the Codex desktop app in February, giving technical and semi-technical users a natural-language surface for generating code and completing development tasks. He describes it as the “ideal surface” for financial-services teams because the same interface can be used for code work, reporting, review, and integration-heavy operational tasks.

OpenAI’s own adoption is used as the proof point. Spicer says Codex passed one million downloads in its first week after launch, has grown to more than four million weekly active users across Codex tools, and is now used by OpenAI engineers “by default.” Internally, he says, OpenAI now ships more in a week than it previously shipped in a month, with 50% more pull requests per engineer.

4M+

weekly active users across Codex tools, according to Spicer

He is careful to distinguish this from engineer replacement. The reported change is workflow-level: engineers are still there, but their work shifts from directly writing each line to prompting, inspecting, steering, and reviewing agent-produced changes. The output claim is tied to headcount leverage: more code and product shipped without increasing staff at the same rate.

For banks, the competitive pressure is to anticipate customer needs faster

Financial institutions are under pressure to move customer-facing products from retrospective dashboards toward anticipatory experiences. Spicer uses a fictional bank, Blossom Bank, to make that pressure concrete: the bank has a strong consumer product, but customers are demanding features that help them plan ahead rather than merely understand past behavior.

The requested feature is predictive budgeting. Customers do not only want a record of what they have spent; they want warning about what they are likely to spend and whether that will put them over budget. A social post shown on screen captures the change in expectation: “Blossom Bank: ‘Here’s what you spent.’ Cool. Now tell me what I’m about to mess up before I do it.” Another displayed complaint says, “I don’t need another spending dashboard. I need Blossom Bank to warn me that my spending this week is about to blow up my month.”

Blossom Bank: “Here’s what you spent.” Cool. Now tell me what I’m about to mess up before I do it.

That kind of feature normally requires coordination across different teams. Spicer’s claim is that Codex can compress part of that coordination by working across the codebase and the surrounding context while keeping the developer inside one workflow.

The starting mobile app contains a “Weekly Spend” card with a historical view. Spicer identifies that as the target for replacement: the goal is to turn it into a forecasted month-end spending experience that can be shown to users and tested quickly. The change is not treated as a toy UI tweak; it is used to illustrate how a customer-facing financial product feature moves from request to implementation to review.

The agent’s value is partly in all the work around the code

Before writing the feature, Spicer shows Codex handling an operational interruption: summarizing recent incidents. The prompt asks the agent to “take a look at our recent incidents and give me a nicely formatted summary.” Codex responds that it will pull incident data and surface the useful parts rather than dumping raw screens.

The agent can use context from codebases, documentation, and observability tools to answer a question that might otherwise require coordination across teams. Spicer’s example is a live meeting: if an incident topic comes up unexpectedly, Codex can pull the relevant operational summary immediately.

He then moves from ad hoc queries to scheduled work. Codex is shown with automation templates such as status reports, staff weekly mobile brief, and release prep. Spicer creates a weekly engineering summary automation with sections for summary, shipped or merged work, blockers, incidents or operational issues, reviews and quality signals, and risk follow-ups. The goal is to make useful engineering routines repeatable across a team rather than dependent on one person remembering to ask.

That surrounding work matters to the larger argument. Codex is not only positioned as a coding assistant for implementation, but as a system that can turn scattered engineering context into recurring artifacts: incident briefs, release preparation, weekly summaries, audit evidence, and review signals.

Implementation starts from approved context in SharePoint

For the predictive-spend feature, Spicer asks Codex to turn a feature request in SharePoint — “PMO-Monthly_Forecast” — into an implementation spec for the mobile app. He says the real scope may have been approved by management and worked through with design and product teams, with the relevant context living outside the code editor. Codex is shown using a SharePoint connector to locate the referenced file.

The visible spec identifies the feature as: “Customer Request: Show forecasted month end spend against the user’s monthly budget in the mobile app.” The developer can inspect what Codex found and validate the plan before any code is changed. The agent’s workstream is visible: it retrieves the source document, inspects the application architecture, and proposes an approach.

Only after that does Spicer approve implementation. He also asks Codex to run tests so the resulting code can be checked against standards before completion. The agent then works on the “approved predictive spend UI,” saying it will wire the change into the existing app, add or update deterministic forecast fixtures, swap the dashboard card UI, and run typecheck.

The workflow is supervisory rather than passive. The developer prompts the agent, watches what it is doing, can steer it before completion, and then inspects the diffs.

Instead of typing the code out myself, I'm prompting this, I'm inspecting how the agent is building it.

Conor Spicer · Source

A code view shows a changed file, src/components/BlossomCard.tsx. In the mobile emulator, the historical weekly-spend card has been replaced by “Month-end forecast,” with the message “Projected to finish safely under budget” and a displayed figure of $2,884.60.

The larger product claim is that a feature can be prototyped quickly enough to collect feedback and share with others, while still keeping the developer in the loop for inspection and approval.

Legacy portals and regulatory paperwork are treated as automatable workflow, not exceptions

Spicer anticipates the objection that banks do not only build product features; they deal with regulators, legacy portals, and institutional friction. He describes a “Blossom Bank Change Portal,” a legacy-looking web form with fields such as customer segment and risk rating, as the kind of system that can slow changes to consumer-facing apps — especially when a portal may not expose APIs.

Codex can use a skill and browser automation tooling to handle the draft-submission task. The agent is shown searching across code, documentation, and evidence to populate the form. In one status message, it says it found four compact-feature validation run evidence artifacts, “which provide enough trail data to meet the standard documentation requirements.”

Spicer sets a clear boundary around submission control. Codex can validate the form and save the draft, but “it won’t just hit submit.” It produces a summary of the changes it made and leaves the human to review final points or submit. He says this can reduce work that previously took hours to “a couple minutes or less,” while preserving a human checkpoint at the final decision.

The same division of labor appears here as in product-building: the agent gathers context, drafts, populates, and validates; the human inspects and approves.

Higher code volume makes review and operating model the constraint

Increasing development speed creates a second problem: more code has to be reviewed, secured, integrated, and deployed without lowering the bar for production changes. Spicer makes that constraint part of the Codex case rather than treating it as an afterthought.

In GitHub, a pull request for “Weekly spend ux updates” has already passed automated tests and human review. Codex is embedded in the pull-request process as an automated reviewer. In the example, it identifies a cyber issue: a potential mishandling of sensitive fields that the human reviewers missed. The issue can then be sent back to Codex to start implementing the fix.

The point is not simply that Codex can generate more code. Spicer argues it can also provide “production ready contextual code review” for the increased volume of work it helps create. The closing slide lists the relevant surface area: code, code review, integration, security, and deployment.

Spicer also names the organizational tension directly. More code and more tooling create strain. His team’s role, he says, is to help customer engineering organizations scaffold new processes so that increased code volume can be matched by security, integration, deployment, and review practices.

The competitive argument for financial institutions is two-sided: AI-assisted development can help teams ship customer-facing product changes faster, but the operating model has to absorb that speed. Codex is most valuable in Spicer’s account when it spans specification, implementation, audit trail, code review, and controlled release.

AI Security AI in Operations Agents and Autonomy Coding Assistants Enterprise AI Adoption