Intercom Doubled Engineering Throughput by Standardizing on Claude Code

Brian ScanlanAI EngineerFriday, May 15, 202613 min read

Brian Scanlan, a senior principal engineer at Intercom, argues that the company doubled engineering throughput by treating AI coding as an internal platform strategy rather than an individual productivity tool. In his account, Intercom standardized on Claude Code, encoded recurring engineering work into agent-usable skills, connected agents to internal systems under existing controls, and made AI adoption an explicit expectation across R&D. The reported result was a doubling of pull-request throughput, including 17.6% of merged PRs approved by Claude, alongside new bottlenecks in review and CI.

Intercom says it doubled PR throughput by standardizing on Claude Code

Brian Scanlan said Intercom set a simple internal goal in the middle of last year: double engineering throughput in a year. The company called the project “2x,” staffed it with a dedicated team, and chose code changes per R&D person as the primary measure.

Scanlan said Intercom reached the goal of doubling pull-request throughput in less than a year. The inflection, according to the chart he showed, followed the decision to go all in on one tool in December and begin rolling it out in January. The chart tracked merged pull requests per R&D head against a 2x target and showed a sharp increase after that rollout.

The metric was not presented as perfect. Intercom also uses developer surveys and tools such as DX, and Scanlan acknowledged the familiar problem that “every measure is bad” once it becomes a target. But the company wanted a throughput measure that would move if AI was genuinely changing work across engineering rather than making individual tasks feel marginally faster or more enjoyable.

The first wave of AI coding tools had not satisfied that ambition. Intercom used GitHub Copilot, saw engineers adopt Cursor, evaluated Augment and other options, and found some gains. Scanlan described those gains as marginal: some tasks improved, some work became more fun, but the overall system had not changed enough.

The larger change came when Intercom stopped treating AI coding as an individual preference and began treating it as a platform strategy. Scanlan tied that timing to a broader jump in coding-agent capability around the previous Christmas break, showing a post from a principal engineer reacting to Tobi Lütke’s claim that he had shipped more code in three weeks than in the previous decade. That capability shift, Scanlan said, contributed materially to Intercom’s 2x results.

Intercom’s broader company context shaped the urgency. Scanlan described Intercom as a 15-year-old, privately held Irish-American B2B SaaS company that pivoted to become an AI company the weekend ChatGPT came out. Its customer-support agent Fin, he said, has more than 8,000 customers, revenue approaching $100 million, and about two million resolutions a week. Intercom has also moved Fin’s English text-based conversations onto its own model, which he said is cheaper, faster, and better than frontier alternatives such as Sonnet. Intercom’s displayed production comparison said Fin Apex 1.0 delivered a 2.8% higher resolution rate, 0.6 seconds faster time to first token, and a 65% reduction in hallucinations versus the frontier-model comparison shown on the chart.

But Scanlan’s subject was not Fin. As a senior principal engineer on Intercom’s platform group, he focused on the company’s internal software-development system: uptime, performance, security, cost management, observability, internal developer productivity, and the 15-year-old Rails monoliths that Intercom still runs. Intercom, he said, has long been “obsessed with shipping,” with the belief that fast, iterative shipping is the best way to build high-quality products customers want.

AI use became an explicit expectation

Intercom framed the adoption effort as organizational change before engineering technique. Brian Scanlan said the company updated job descriptions and expectations. His formulation was direct: if an engineer, product manager, designer, or other R&D employee was not adopting AI, they were not meeting expectations.

That required repetition from leadership, not a one-off announcement. Scanlan said the company had to communicate “the same message over and over and over,” across forums, while constantly emphasizing the urgency of AI adoption. It also had to reward the behavior: financially, socially, and publicly. Intercom created Slack channels where updates to skills and other AI-related work could be surfaced, celebrated, and copied by others. It ran hackathons and AI immersion days, and gave people enablement, support, and tool access rather than simply telling them to “AI everything.”

The staffing choice was also part of the system. Intercom created a full-time Team 2x, which Scanlan said kept growing. For a medium or large organization, his view was that this cannot be delegated to spare time or left to scattered enthusiasts. The best people have to work on the transition full-time, because hundreds of engineers and R&D employees have to change how they work.

The company also had to be specific about what it wanted. Intercom wrote down principles, maturity levels, technical guidance, and operational expectations. The strategy was to make the preferred way of working concrete enough that engineers could see where they were, what the next level looked like, and what the organization would support.

Standardizing on Claude Code was a bet against tool sprawl

Intercom standardized on Claude Code as its AI engineering platform. Before that, Brian Scanlan said, the company was “omnivorous”: engineers could choose their favorite editor or coding agent, and adoption was spread across Claude Code, Cursor, Augment, and other tools.

He argued that choosing one platform mattered more than which platform was chosen. The analogy he used was multi-cloud: spreading work across several platforms prevents the compounding benefits of a well-designed internal platform. Unless there is a specific, high-impact reason to distribute work across multiple agents, he argued, an organization is better off going all in on one, optimizing it, proving it works, and building institutional knowledge around it.

Intercom’s ambition was to make Claude able to act like a senior engineer on any technical task across the company. That meant connecting it to everything an engineer could use on a laptop, while relying on existing enterprise controls, permissions, audits, and guardrails rather than leaving the agent unconstrained. Scanlan said Intercom was not trying to let AI “delete all of our databases,” but as a mature company with controls in place, it could give Claude access in the same way it gives engineers access.

The onboarding problem was central. Intercom has 15 years of internal software, conventions, Rails architecture, React patterns, testing standards, security rules, and operational practices. Claude could not be expected to perform well in that environment by relying on generic model knowledge. Scanlan said it had to be taught the same Intercom-specific knowledge new hires learn.

That knowledge was packaged into engineering guidance, skills, hooks, plugins, and workflows. The company pushed internal Claude plugins to employee laptops and even bypassed some normal Claude Code update mechanisms, because managing hundreds of local installations became its own operational burden. Scanlan compared the experience to managing Python installs across many machines.

The platform strategy had a second side: Intercom wanted to avoid building too much AI infrastructure of its own. Scanlan described the company as technically conservative, preferring single tools used very well. Its Rails monoliths are an example of that bias. In AI engineering, the equivalent principle was “run less AI software”: focus internal work on evergreen, Intercom-specific capabilities, and aggressively deprecate internal tools when first-class vendor replacements become available.

The durable asset, in Scanlan’s view, is not a custom orchestrator or a fashionable workflow. It is the written, tested, high-quality description of how work should be done at Intercom. Tool implementations may change. The internal knowledge of how to perform Intercom-specific engineering work remains valuable.

The unit of leverage is the skill, not the prompt

Brian Scanlan repeatedly distinguished Intercom’s approach from “prompting better.” The company’s operating model is skill-driven: recurring technical work is captured in durable, testable skills that agents can invoke. Those skills are improved through data, backtesting, human labeling, session mining, and continuous feedback from real usage.

Intercom instruments the system. Scanlan showed org-wide skill invocations trending upward, with the displayed chart reaching around 20,000 weekly invocations. He said the company hooks information into Honeycomb to see which skills are being invoked, and that the telemetry is internally available without private information. Intercom also pulls session transcripts into S3 for data mining, reporting, and analysis of whether skills are effective.

The feedback loop is meant to identify gaps in skill coverage and improve the skills themselves. Scanlan described the maturity path as moving from using AI autocomplete and asking Claude questions, to having agents write code, to rarely reading code or opening an IDE, to writing skills for others, then writing self-improving skills with evals, and finally optimizing the whole engineering environment for agents.

Intercom’s displayed maturity model listed that progression explicitly: engineers begin by writing code, responding to incidents, and handling bug reports by hand; then use AI autocomplete or ask Claude questions; then have agents write the code; then rarely read code or open an IDE; then write skills for others; then write self-improving high-quality skills with evals that make others more productive; and finally optimize work practices, software architecture, and infrastructure for agents.

Scanlan’s instruction to engineers is to use Claude Code for everything, automate the work, move that automation into a skill, improve at writing skills, then write skills that improve skills. At the highest level, the engineer’s job becomes changing the environment so agents can do better work.

Agents should be given problems and figure out the necessary tasks on their own.

Brian Scanlan

Scanlan urged engineers to give agents problems, not tasks. Engineers still often tell an agent to run a specific skill, and he said that remains useful and sometimes necessary. But the direction of travel is to describe the situation or problem and let the agent infer which skills, tools, and steps are needed.

His example came from a security incident. Intercom had accidentally published Snowflake table metadata to a public GitHub repository. Scanlan opened Claude Code, told it to join the relevant Slack channel and take a look, and did not specify the internal procedure. Claude found and invoked a skill Scanlan said he did not know existed, one that encapsulated Intercom’s data-breach policies, criteria, and analysis steps. It downloaded the files, analyzed the issue, concluded it was innocuous, and returned next steps in about two minutes.

Scanlan estimated that the same work would have taken him about 20 minutes by hand: finding the policy, checking the criteria, inspecting the files, and deciding what to do. A small, policy-bound task became agent-executable because the company had encoded the relevant internal procedure and connected the agent to the right systems.

The reported throughput gain showed up in pull requests, review, and defect work

The “Claude PR Flow” dashboard made the review system legible as a new bottleneck. Brian Scanlan showed 968 total merged pull requests, 95.9% with a Claude label, and a 17.6% Claude approved rate. The dashboard also showed 2.5% bypass approval rate, 25.9% approved in under 30 minutes, and 24.0% total time under one hour.

Metric	Displayed value
Total merged PRs	968
Has Claude label	95.9%
Claude approved rate	17.6%
Bypass approval rate	2.5%
Approved in under 30 minutes	25.9%
Total time under one hour	24.0%

Metrics shown on Intercom's internal Claude PR Flow dashboard

17.6%

Claude approved rate shown on Intercom's internal PR-flow dashboard

Scanlan said code review had become the current bottleneck. Intercom’s response was to move some approvals to automatic agents, but he emphasized that this was not simply “hey Claude, can you approve this.” The company used prior data, backtesting, and human labeling to understand confidence levels and shape the system toward simple, safe pull requests that could be approved automatically.

Scanlan said Intercom worked with its auditors to ensure the automated approval process was fully SOC 2, ISO 27001, and HIPAA compliant. His claim was that those certifications do not require humans in the loop, but they do require knowing exactly what you are doing and having auditing controls and related controls in place. He argued that a well-organized, tested suite of agents can reduce risk compared with human review in tightly defined situations, because humans are not always as consistent.

The defect trend was another reported benefit, though not an original goal. Scanlan showed open defects rising and then dropping, broken out by priority. He said he was not proud that defects had been increasing until recently, but that teams had begun closing defects faster than ever. Some teams, inspired by the AI shift, began pursuing backlog-zero-style efforts and crunching through hundreds or thousands of defects. He described this as partly planned and partly a natural deflation caused by the lower cost of getting through routine work.

Scanlan also briefly said Intercom has been working with a Stanford research group, giving them access to Intercom’s code, and that according to that group’s metrics, code quality has been increasing.

The throughput increase also created infrastructure pressure. Scanlan said Intercom’s CI “melted” and had to be fixed. Doubling engineering throughput is not only a code-generation problem: existing review, CI, test, deployment, and operational systems can become bottlenecks once agent-assisted development increases the volume of work.

A flaky-test skill shows how agent work becomes institutional knowledge

A “Fix Flaky Spec” skill showed what Intercom means by skills. Brian Scanlan said Intercom has hundreds of thousands of tests, and because the company ships a lot, flaky specs accumulate. The skill he showed was not a loose prompt. It was a structured workflow for investigating and fixing flaky RSpec specs in the Intercom monolith through data-driven root-cause analysis.

The visible documentation described it as a rigid skill: “follow the steps in order,” “do not skip steps or jump to conclusions.” It required a GitHub issue link or spec file path as input and applied when a spec was reported as flaky, failing intermittently in CI, or described by a user as a flaky test or intermittent failure.

The first step was a hard gate: ensure CI log access. The documentation warned never to guess the failure from code alone, because code-only analysis produces plausible but wrong hypotheses. The example on the slide contrasted a false conclusion about a missing case caused by timeout with an actual failure caused by a UUID collision — a different root cause requiring a different fix.

The workflow then gathered context, ruled out a broken spec, checked whether a fix had already landed, retrieved failure data from S3 and Buildkite, and classified the failure. The classification table included categories such as actually broken rather than flaky, global state poisoning, shared test state bleed, shared singleton collision, missing test setup or data, timing or race condition, and external service flake.

Scanlan said he did not build the skill by sitting down and enumerating every step himself. He worked in a feedback loop with the agent: giving it a goal, guiding it, using it to fix many flaky specs, and letting that experience turn into a better workflow. The resulting skill included lookup tables, progressive disclosure, and organized procedures.

His assessment was that it was fixing issues in a way that, if done by Intercom’s most senior Rails engineers, would make him think “wow, they’re amazing.” The value was not just that an agent could fix a flaky test. The procedure for fixing flaky tests became reusable institutional knowledge, executable by an agent and improvable over time.

Scanlan showed two overlapping summaries of Intercom’s Claude Code plugin work rather than a single consolidated dashboard. One terminal summary described a repository with 42 plugins, 332 skills, and 165 unit test files.

Repository metric	Displayed value
Plugins	42
Skills	332
Unit test files	165

Metrics shown in one summary of Intercom's Claude Code plugins

A later summary described the Claude Code plugin platform as having 42 plugins, 12,185 total lines of code, 751 existing Fast files, 7.9 million words automated read/write, 252 contributors, and 119 active contributors in the previous 30 days.

Plugin-platform metric	Displayed value
Plugins	42
Total lines of code	12,185
Existing Fast files	751
Words automated read/write	7.9M
Contributors	252
Active contributors, last 30 days	119

Metrics shown in a separate summary of Intercom's Claude Code plugin platform

The base plugin provides shared safety hooks, session setup, install detection, permission telemetry, and universal skills. Visible examples included Snyk/npm URL validation, AWS guardrails, merged PR protection, settings configuration, and auto-update checks.

Engineering work is moving up the stack

Brian Scanlan argued that all technical work is becoming agent-first. Agents must be able to perform anything a human can do, even if connecting them to production systems initially feels strange. Documentation, in this model, becomes infrastructure: not a passive reference, but the substrate agents use to act.

He compared the shift to his earlier career as a Unix sysadmin. Before cloud infrastructure, the work involved data centers, racks, cables, servers, and network configuration. As cloud computing matured, sysadmins moved up the stack into SRE and automation-oriented work that he described as more impactful and higher paid. He sees AI engineering as a faster, industry-wide version of that transition.

The practical consequence is that engineers spend less time manually writing, testing, and reviewing code, and more time writing specifications, validating outcomes, improving agents, and changing the environment so agents can succeed. That includes work practices, architecture, infrastructure, documentation, and tool permissions.

Intercom is already seeing Claude Code usage spread beyond software engineering. Scanlan showed internal notes listing 1,000 weekly Claude Code users across all of Intercom, replacement of runbooks, remote agents in progress, automatic approvals targeted above 50% in progress, single-person team or product experiments, and a possible shift from separate product engineers, designers, and product managers toward “product builders.” Scanlan said people outside software were “banging down our doors” to use consoles.

He also described using Intercom’s own skills to act as a product manager while building CLI interfaces that let users sign up for Intercom, configure and install Fin, and install the messenger from inside an agent. Scanlan showed one of his posts describing an npm install -g @intercom/cli workflow and saying it was “wild how fast things are changing.”

Scanlan ended with a prediction rather than a narrow case study: if an organization is not doing pretty much all of this now, it will be doing it in the very near future. In his account, the differentiator is not whether engineers have access to an AI coding assistant. It is whether the company has turned its internal knowledge, controls, review processes, incident procedures, test practices, and product workflows into an agent-usable platform that improves as people use it.

Agents and Autonomy Coding Assistants AI Application Architecture RAG and Knowledge Systems Evals and Benchmarks Enterprise AI Adoption