UK Government Tests an Insurgent Model for In-House AI Delivery

Eoin MulgrewAI EngineerMonday, May 18, 202614 min read

Eoin Mulgrew of the Number 10 data science team argues that the UK state’s AI problem is less a shortage of use cases than a shortage of technical people with the access, mandate, and proximity to build inside government workflows. In a talk on the No. 10 Innovation Fellowship, he presents the model as a deliberate hack around normal civil-service constraints: market-rate pay, outside recruitment, a highly selective technical process, and authority to enter departments and ship tools that remain with the teams using them.

The intervention is proximity, mandate, and repeatable capability

The Cabinet Office was preparing to spend £1.5 million on an outside law firm to analyze the UK statute book. Eoin Mulgrew used that as a typical case, not a showpiece exception.

The statute book, he said, is “the height of four African elephants” of legalese, but the task was still “a pretty obvious AI use case.” Instead of commissioning the external analysis, one engineer from the Number 10 data science team embedded with the in-house legal team for a couple of weeks. The result was not just a cheaper report. It was a tool the legal team could keep using.

The system, labeled “Consultation Duty Discovery” and “Attorney Generals Office — AI powered statutory analysis,” ingested legislation, parsed sections, extracted keywords, used GPT-4 classifications, extracted entities, and formatted results for review. It could run against sample Acts, current Act sets, all Acts, or custom slices of legislation. A review screen showed consultation obligations by Act, actor, target, and department, with flags for further review.

Mulgrew’s point was that the outside analysis would have been slower than the pace at which new laws and regulations are made. By the time the work was complete, the result could already be going stale, requiring the same exercise again later. The in-house tool changes the unit of value: from a one-off legal analysis to a repeatable capability sitting with the users who need it.

£1.5M

outside legal-analysis spend Mulgrew said was avoided by embedding one engineer with the in-house team

The scanner demo gave the scale of the terrain: 3,748 currently enacted Public General Acts, about 12 million words of legislation, and hundreds of estimated statutory consultation duties. Another view showed 928 Acts processed, 74 obligations, and 33 Acts containing at least one duty. Those numbers mattered less as a scoreboard than as evidence for the operating model. The team did not centralize expertise away from the lawyers; it put technical capability close enough to the legal workflow to leave behind a tool that could be rerun “at the drop of a hat” and potentially shared with other government teams.

That was the through-line in Mulgrew’s talk. The UK government does not lack AI use cases. Its harder problem is getting capable technical people close enough to consequential work, with enough permission to ship, and leaving behind systems that internal teams can actually use.

The fellowship is designed as an insurgency inside the normal civil-service model

Eoin Mulgrew described 10DS, the Number 10 data science team, as a unit created during the pandemic, partly in response to it. Its core purpose, in his words, is to make sure “the most important decisions in the country are informed by the best possible evidence.” It is now scaling AI engineering and development capability inside Number 10 and across “strategically important parts of the state.”

The pressure behind that work is a public-service delivery and productivity crisis. Mulgrew pointed to 7.25 million people on NHS waiting lists, about 350,000 court cases stuck in backlog, and only one in five planning applications being decided on time. A chart attributed to ONS and TBI / EY / ONS showed whole-economy productivity rising from 1997 to 2022 while public-sector productivity stayed comparatively flat or declined slightly. The same visual put possible annual productivity gains from AI in government at £40 billion and the annual cost of the productivity gap at £80 billion.

Mulgrew called government an “industry” rather than an organization: a large, complex system of about 400,000 people. That framing matters because the barriers are not just technical. Government has rigid pay structures, hierarchy, slow approvals, recruitment processes not built for specialist technical talent, and real safeguards because it is accountable to Parliament and the public.

His answer is the No. 10 Innovation Fellowship, a small unit operating with different rules. Mulgrew described it as “a small insurgent unit” at the center of government: backed by a Number 10 mandate, able to deploy across departments, given unusual political air cover, and allowed to move with more autonomy than normal government teams.

Design choice	Mulgrew’s description
Mandate	Operate from Number 10 and deploy across government
Pay	Market rates within reason, not Meta-level compensation
Autonomy	Freedom to move quickly, choose opportunities, and ship
Recruitment	A technical selection process with roughly a 0.7% to 0.8% success rate
Talent source	Exclusively recruited from outside government
Political backing	Senior sponsorship and air cover to enter departments and get work done

The operating model Mulgrew described for the No. 10 Innovation Fellowship

Pay is part of the design, but Mulgrew did not present it as the main attraction. The fellowship can pay market rates “within reason,” which makes it economically viable for people to take the roles. His claim was that many high-performing technical people will accept a pay cut if the work is consequential and if they believe they will be able to do their best work.

The more distinctive choices are recruitment and talent source. Mulgrew said the standard civil-service process is optimized for many things, “but not necessarily recruiting exceptional technical talent.” The fellowship has been allowed to recruit its own way, through a demanding process focused on technical skill. It recruits exclusively from outside government.

The slide listing where fellows come from included DeepMind, NASA Jet Propulsion Laboratory, Caltech, CERN, Monzo, Y Combinator, MIT, Google, J.P. Morgan, Microsoft, Harvard, Amazon, Oxford, Imperial, Helsing, Faculty, and QuantumBlack. Mulgrew said the team has hired from labs, big tech, top research institutes, YC founders, and serial entrepreneurs — people who likely did not expect to be working in the civil service a year earlier.

He was careful not to make the theory sound too easy. Many people from the technology industry, he said, assume that if a team has a large enough ministerial mandate it can simply break through data silos and force change. “In practice, it’s a lot harder than that, otherwise everybody would be doing it.” The fellowship is not just a ministerial stick. It is a way to embed outsiders inside the machinery, with enough legitimacy and practical proximity to build.

Forward-deployed engineers sit inside the workflows they are changing

Mulgrew distinguished between two modes of work. For simpler opportunities, 10DS deploys engineers directly into teams and moves quickly. For harder operational problems, it tends to partner with departments or specialist units over longer periods.

The direct-deployment model is deliberately close to the work. Mulgrew said Number 10 now has forward-deployed engineers embedding with policy advisers, lawyers, communications teams, pollsters, and operational teams. They observe workflows, identify pain points, co-design tools with users, and try to move from idea to implementation in a couple of weeks.

One example was a policy simulation tool used inside Number 10. An interface labeled “10DS Microsimulation” showed previous chats about income tax and Universal Credit. Mulgrew said the tool lets policy teams test the effects of policy choices before decisions are made. In the Universal Credit demo, a “typical London family” — a couple with two children, paying £18,000 in annual rent and earning £35,000 in combined employment income — received different annual Universal Credit amounts under different taper rates.

Universal Credit taper rate	Annual UC amount	Difference from current	Monthly increase
55% current law	£11,288	—	—
50%	£12,384	+£1,097	+£91
45%	£14,013	+£2,714	+£226
40%	£15,678	+£4,391	+£366

The Universal Credit scenario shown in the 10DS microsimulation demo

Mulgrew did not claim the tool replaces human analysis. His claim was narrower: more decisions inside the building can be informed by high-quality modeling, and at a much faster rate than would otherwise be possible.

A questioner later raised the risk of sycophancy: if a user wants to hear that a preferred policy is brilliant, could the model be steered into agreement? Mulgrew accepted that as “a very real risk.” He said 10DS had not encountered it much because models are red-teamed before release to users and because the team provides upskilling to the lawyers, sociologists, professors, and other non-AI specialists who use the tools.

Another Number 10 tool was aimed less at modeling policy than interrogating delivery. Mulgrew said Number 10 is responsible for delivery of every major project and manifesto commitment in government, which means it receives a constant stream of progress reports. The team built what he called a “delivery red teaming” tool, “essentially a PMO” in the pockets of Number 10 delivery teams.

The HS2 triage demo mapped dependencies across design, construction, stations, rolling stock, systems testing, and initial services. Mulgrew said the system is used not only to interrogate delivery reports, but to offer a second judgment on the teams producing them. It can flag whether a department or delivery team tends toward optimism bias, whether it disproportionately marks risks as amber, and whether its mitigations usually work.

In this account, AI was not only a way to automate work. It was also a way to make institutional memory and adversarial review available to decision makers at the center of government.

In-house capability changes what the public can see

Mulgrew connected in-house engineering capacity to transparency. Until a couple of months before the talk, he said, the government had not published a public-facing dashboard allowing people to see how it was doing on delivery. It had since published two in as many months.

One dashboard covered the AI Opportunities Action Plan drafted by Matt Clifford. Mulgrew said it lets people see how the UK is doing on rolling out compute and setting itself up for AI adoption. The visible dashboard said that in January 2025 the government committed to take forward 50 recommendations; it showed 38 completed in full and 12 in progress.

The courts dashboard made public some of the strain in the criminal justice system. The demo included almost 79,959 cases, an average Crown Court wait time of 255 days, cases in the backlog that had been waiting more than two years, and rape cases waiting one year or more. One panel quoted Sir Brian Leveson: “More money and efficiency measures alone will not be sufficient to allow the system to operate as it should,” followed by the statement that structural court reform is also needed.

Mulgrew also mentioned a public service that a minister was due to launch two and a half weeks later. He did not describe it, saying he did not want to steal the minister’s thunder, but said it was the kind of service people would find hard to believe did not already exist. The idea had emerged two months earlier and was about to be live and used by the public. In normal government, he said, a project like that might remain in discovery for a year or more.

The dashboard and public-service examples were about cycle time as much as interface design. Mulgrew’s claim was that once the state has technical builders inside the right teams, it can publish, iterate, and expose operational information in ways that had not been routine.

Partner teams turn fellowships into institutional capacity

For larger or more specialized problems, Mulgrew described 10DS as a partner rather than the sole builder. The pattern is not that Number 10 owns every AI project. It is that fellows enter important institutions, help create technical capability, and keep working with those teams as they mature.

He focused on three partners: the AI Safety Institute, the Incubator for AI, and Justice AI.

Mulgrew referred to the “AI Safety Institute,” while the slide displayed the “AI SECURITY INSTITUTE” logo under the Department for Science, Innovation & Technology. The slide described AISI as a world-leading government body for evaluating advanced AI risks, said it had tested more than 30 frontier models pre-deployment, and said it had pioneered novel evaluation frameworks. Mulgrew called it a “massive win for the UK” and said 10DS supported it from day one by placing fellows into the organization, including to help set up its cybersecurity workstream.

One early fellow, Dr. Harry Coppock, led work on Inspect, among other things. The terminal demo showed an inspect eval command running against arc_challenge using GPT-4, labeled “AI Security Institute — Agentic sandbox tool.” Mulgrew described it as a safe, isolated environment for testing what AI agents do when they are given autonomy and tools.

The Incubator for AI, now in the Department for Science, Innovation and Technology, was more directly tied to the fellowship. Mulgrew called it a spin-out of the program: a team that incubates AI solutions for use across the public sector. Most of its original technical founding team, he said, were fellows. 10DS now collaborates with the incubator on scaling some of that work.

The clearest i.AI example was Extract, a planning tool built in collaboration with DeepMind and based on Gemini. Mulgrew said several 10DS people had worked on it. It digitizes parts of the planning application process that remain handwritten or hand-drawn, including maps. The tool was unveiled by the Prime Minister at London Tech Week the previous year and is being rolled out to every local authority in England.

Planning was one of the bottlenecks Mulgrew had identified earlier: only one in five planning applications are decided on time. He linked planning delays to economic growth, which he described as “basically the biggest challenge this country faces right now.” Aspirationally, he said, the work could help move toward more planning applications being decided automatically by AI.

His education example was more guarded. Mulgrew spoke about the promise of AI tutors to help narrow education gaps by putting high-quality tutoring in front of children regardless of socioeconomic background. But he stressed that it has to be done carefully. The current work, as he described it, is about safeguards and evaluating frontier models against benchmarks, including classroom safety and teaching quality, rather than mainly building a government tutoring product.

The benchmark shown tested whether a tutor manages cognitive load: clear language, appropriate chunking, scaffolding, concrete explanations, and specific feedback. In Q&A, an EdTech questioner argued that motivation may be the harder problem: a 12-year-old put in front of a computer may simply try not to learn. Mulgrew said student uptake had not yet been a major focus. The initial testing shown had involved about 70 teachers role-playing pupils, and the government’s plan at that stage was largely to set benchmarks and guardrails so schools can adopt products, not to compete with companies building them.

The prison example is the clearest statement of what access is supposed to mean

Justice AI took the forward-deployed model furthest from the central policy environment. Eoin Mulgrew described it as a new team in the Ministry of Justice. He said he would not call it a fellowship spin-out because that would give 10DS too much credit, but its founder, Dan James, is a former fellow.

Justice AI is deploying forward-deployed engineers into prisons and other parts of the criminal justice system. Mulgrew compared the model to what 10DS does inside Number 10 with policy teams, communications teams, and lawyers, except that the engineers are embedding with parole officers and prison wardens.

He did not give operational details. He said much of the work is around using AI to stop the flow of drugs into prisons, find efficiencies in manual processes involving many people, and improve security and safety in the prison system.

The closing image was of Will, a current fellow, outside HMP Wandsworth. Mulgrew said Will had recently been in California, had dropped out of Harvard, founded a company, taken it into Y Combinator, made some money, and then chosen to work for 10DS. In his second week on the job, Mulgrew said, Will was standing outside a prison with the keys to that prison, about to go in.

You've maybe done good stuff in industry. That's brilliant. Come join us and we'll give you the keys to the state and see what you can do.

Eoin Mulgrew

That line was the most compressed version of the fellowship pitch. Mulgrew was not promising abstract influence. He was arguing that the state can attract unusual technical outsiders if it gives them direct access to real institutions and real operational problems. In the prison example, “the keys to the state” were also literal keys.

Mulgrew called the model a proof point, not the final operating system

In response to a question about scaling the forward-deployed engineer model across central government, local government, party lines, and institutional realities, Eoin Mulgrew acknowledged the limits of the insurgency approach. Small elite teams can achieve a lot, and some fellows may go on to create new teams. But he said that is not enough to “turn the oil tanker,” at least not quickly enough.

The fellowship, in his account, is partly a bargain with ministers: let the team operate by different rules, use it as a proof point, and then use the proof to change how the rest of government works. “This at the moment is basically a hack to get around the system,” he said. He wants much of what the fellowship does to become business as usual.

Scaling also means shifting from targeted projects to horizontal processes repeated across the state. Mulgrew warned against thinking of the civil service as mainly policy officials in central London. That is “a very small sliver” of a roughly 400,000-person workforce. Many civil servants are call-center operators, prison wardens, nurses, and other operational workers.

The horizontal use cases he named included transcription, which he said police officers often describe as the bane of their existence, and large call centers in departments such as DWP and HMRC. If the team wants to “dial up the ambition,” he said, it should go after processes that can be applied en masse across the system.

A final question asked about collaboration with other countries. Mulgrew said 10DS does “a bit,” naming similar but different initiatives in the US government, including Tech Force and parts of the US Digital Service, and saying the team talks quite a bit with Singapore. He said it could do more.

The work remains early and experimental by Mulgrew’s own description. The proof points he claimed were specific: saving money, shipping public services at unusual speed, reforming frontline services, and putting AI capabilities into the hands of teams at the top of government. The long-term ambition is larger than the fellowship itself: to make the hack unnecessary by changing how government hires, deploys, and trusts technical people.

Evals and Benchmarks AI Safety and Alignment AI in Operations AI Policy and Geopolitics AI Product Management Enterprise AI Adoption