Orply.

AI Software Winners Will Own Context, APIs, or Outcomes

Nathan LabenzAndrew LeeThe Cognitive RevolutionFriday, May 15, 202623 min read

Tasklet chief executive Andrew Lee argues that AI software is consolidating toward a few horizontal agent platforms that hold context, connect tools, generate interfaces, and choose among models. In a discussion with Nathan Labenz, Lee says Tasklet has rewritten its agent stack around file-system memory, agentic search, and provider-specific context management because the chat transcript is no longer enough. He also frames Anthropic as both Tasklet’s critical supplier and a major competitor, making model neutrality central to Tasklet’s bid to survive the AI transition.

Tasklet rebuilt around the file system because the chat transcript stopped being the agent

Andrew Lee says Tasklet has “rebuilt basically everything” in the six months since his previous discussion with Nathan Labenz. The continuity is not in the code, visual design, product structure, integrations, computer-use layer, context manager, or compaction system. It is in the operating premise: speed remains the moat, and the product must keep moving as the models change what is possible.

The first major change is product-level. Tasklet launched in October as a workflow automation tool: a user described a recurring workflow, Tasklet configured it, and a task agent ran periodically or when events fired. In that world, the main chat with the setup agent was short. Most agent runs were also short. Context engineering could remain relatively simple.

The feedback changed the product. Users who had already connected Tasklet to their accounts and given it organizational context did not want the agent only to run workflows asynchronously. They wanted to talk to it synchronously as a general-purpose agent. That forced Tasklet into a product experience Lee describes as “one big linear chat where everything is in one chat.” It is better for users, but it breaks the naive engineering model: Tasklet cannot simply send an indefinitely long history to the model, and even if the context window permitted it, feeding millions of tokens into every automation would be too expensive.

Tasklet’s answer was to stop treating the chat history as the thing sent to the model. The history moved into the file system. The model now receives hints about what is in that file system and what it should read to perform the work.

What if instead of the history being the thing we send to the LLM, what if the history is in the file system? What if the files are the agent?

Andrew Lee · Source

The practical implication is that Tasklet’s effective memory scales from what fits in the model context window to what fits in a file system. Lee credits others with reaching similar conclusions, and says Tasklet made the switch around November.

The same shift also made computer use central rather than peripheral. Early Tasklet had a Windows machine, then a Linux machine, “tacked on” as an add-on. Most agent activity did not touch it. Now, computer use is “the absolute core of the product”: agents run shell commands, touch a file system, query a database, and use a tightly integrated browser environment. Each agent has a persistent headless VM and browser VM. If computer use fails, Lee says, “everything goes down.”

Integrations were also re-architected. Lee says this may not appear dramatically different to users, but the underlying system for connecting external services to agents has been rebuilt so the agent has more control over those connections. One visible result: users can now connect multiple instances of the same service, such as several Gmail accounts, to a single agent.

The product still performs many of the same jobs, but none of the original assumptions survived contact with the new use case. Once Tasklet became both a synchronous general-purpose agent and a workflow automation system, the agent could no longer be a short chat wrapped around scheduled jobs. It needed persistent state, tool use as a primary substrate, and a memory design that did not assume the model would ingest the whole past every time.

Context is now compressed by recency, cached by provider, and rebuilt around agentic lookup

Tasklet’s context strategy is now a layered compression system. Andrew Lee says caching has become more important, not less, because moving “the real context” into the file system increases the number of tool calls needed for basic operation. Agents load files, inspect state, and pull relevant material into the current run. Without caching, that would be “crazy expensive.”

The compaction system Tasklet shipped in December starts by writing the entire chat history into the file system. It then constructs a fixed-length summary of that history for the model, with detail decreasing as events recede into the past. The current turn is typically included at high fidelity: user messages, assistant responses, thinking blocks, tool call arguments, tool call responses, and files may all be passed verbatim when the run is short enough. The previous turn is also represented in substantial detail.

As the system moves backward through the history, it strips or truncates elements in stages. Thinking blocks are removed first. Tool call responses are truncated, then stripped. Tool call arguments are truncated, then stripped. Tool calls may be collapsed. Assistant messages shrink. Eventually older material becomes LLM-generated summaries.

The compression is bucketed in a cache-aware way. Tasklet wants to avoid disturbing prefixes, because prefix changes destroy cache utility. Older buckets are added to slowly, then shrunk when they hit thresholds. The thesis is simple: recent material usually matters more, and when older detail matters, the agent should go look it up in the file system.

LayerWhat Tasklet keeps thereWhy it matters
Durable file historyThe full chat history and agent state are available in the file system.The agent can recover old detail by looking it up instead of carrying all history in the model context.
High-fidelity recent contextThe current turn and recent turns preserve more of the user message, assistant response, thinking blocks, tool calls, responses, and files.Recent work is usually the most relevant and least compressed.
Bucketed summariesOlder history is progressively stripped, truncated, collapsed, and eventually summarized by an LLM.Compression preserves a bounded prompt while limiting cache disruption.
Provider cacheAnthropic and OpenAI expose different caching primitives, so Tasklet translates its context differently for each.Token economics depend on what each model provider can cache and for how long.
Tasklet’s context system combines file persistence, recency-weighted compaction, and provider-specific caching

This does not eliminate forgetting. Users still report that agents forget things. It also does not eliminate cost pressure. But Lee says the architecture has worked “pretty well,” and Tasklet plans to double down on decreasing fidelity over time, bucketed compaction, and agentic lookup.

The system updates incrementally whenever the user or trigger does anything, including during a long run. If a turn consumes enough tokens, compression can begin inside that turn. Tasklet persists those compactions because calculating them can itself be expensive; repeatedly running an LLM-based compaction of older history for an hourly or daily trigger would waste tokens.

Caching then depends on the model provider. With Anthropic, Tasklet uses five-minute caching. Lee frames this as adequate for active sessions or ongoing turns, but not useful for most scheduled triggers, which often run hours or days apart. OpenAI, by contrast, offers different primitives that he calls “much nicer” in some respects: he later describes OpenAI’s approach as automatically caching any prefix for 24 hours, while Anthropic’s API requires explicit cache points and permits only four of them in a call.

Tasklet currently gets essentially no caching benefit across users, and largely not even across agents. There are changes the company could make to optimize cache use across agents, organizations, and users, but Lee declines to describe the specifics because the work is upcoming.

The larger lesson is that Tasklet’s memory is no longer a transcript-management problem. It is a persistence, retrieval, compression, and cost-management problem. The agent’s “memory” is a combination of durable files, summaries at multiple resolutions, cached prefixes where the provider allows them, and model-directed search through its own state.

Claude remains the default, but OpenAI is now good enough to make model neutrality real

Andrew Lee still believes Tasklet’s “always bet on the models” strategy has been vindicated. Claude 4 was good enough to start with. Claude 4.5 was the first major unlock for Tasklet’s use case, because it became much better at computer use and could manage the discovery process across connections and tool-enablement flows. Lee also emphasizes the effect of Anthropic’s Opus price cut, using the shorthand that Opus “dropped from 15 to 5.” Initially, he says, Tasklet could really only put people on Sonnet; the Opus reduction was a major unlock.

Claude 4.6 was, in his view, a solid incremental improvement. It made computer use better, both headless and headful, and improved code generation enough to help enable Tasklet’s Instant Apps feature.

Claude 4.7 is a more complicated case. Lee says it is better for coding and one-shotting long projects, but for Tasklet’s iterative knowledge-work use cases it does not look like a large improvement. He attributes a roughly 30% Tasklet cost increase to tokenizer changes, and costs are central because Tasklet effectively passes them through to users. As a result, Tasklet did not make 4.7 the default recommended model. Lee says it will be available as an advanced option, with notice that it costs more.

~30%
Tasklet cost increase Lee attributes to Claude 4.7 tokenizer changes

The original reason Tasklet concentrated on Anthropic was that its core agent harness required a minimum level of intelligence. The model had to navigate connection discovery, activate the right tools and agents, and manage context in the way Tasklet’s product required. Other models could not use the same harness and work well.

That has changed, according to Lee. GPT-5.5 is, for Tasklet’s use case, a “huge step up” over 5.4. By the time the discussion was published, Lee expected Tasklet to have publicly announced OpenAI support or to be close to doing so. In Tasklet’s testing, GPT-5.5 can navigate its harness “super well” and gives Opus 4.6 “a run for its money for most use cases.” Tasklet has signed a deal with OpenAI and is making what Lee calls a “pretty big bet” there.

Lee is also optimistic about OpenAI’s roadmap. In his view, OpenAI made a large compute bet the previous year, and the returns are starting to show. He also sees OpenAI as refocusing its business much more on agentic use cases, citing Codex’s rapid progress as evidence of what could happen if the company applies similar effort to business-productivity agents.

Other providers are closer than they used to be. The latest Google models are solid and improving quickly, in Lee’s opinion, though not yet at Anthropic or OpenAI’s level for Tasklet. Tasklet has also been testing DeepSeek and Kimi. From Tasklet’s testing, the latest Kimi appears “maybe better than Haiku and cheaper,” and Lee expects models like Kimi to make their way into Tasklet soon.

The expected near-term product shape is multi-provider: Anthropic, OpenAI, Google, open-source models, and cost-optimized options for specific work. Lee still expects Anthropic models to remain the best and recommended in many cases. But the strategic position is changing. Tasklet wants to become a neutral layer above model providers rather than a Claude-only application.

Anthropic is both Tasklet’s most important supplier and its most common churn destination

The strategic tension with Anthropic is explicit. Andrew Lee describes Anthropic as an excellent partner: the models are strong, the team is responsive, Tasklet talks to them regularly, receives early access, and gives feedback. “They’re definitely totally enabling our business,” he says.

They are also a direct competitor. When someone churns from Tasklet, roughly 80% of those users go to an Anthropic product, according to Lee. The most common reason is that they already have a Claude Max plan and do not want to spend additional money on Tasklet.

~80%
Tasklet churners Lee says go to an Anthropic product

The pricing asymmetry is difficult because direct Claude Max users appear, to Labenz and Lee, to receive far more effective token access per dollar than API customers can buy. Lee says he does not know the exact ratio. Labenz’s “intuitive gut guess” is roughly five to one; Lee says that would be his guess too, “or maybe even more.” The number is their estimate, not a measured figure.

That perceived gap sets user expectations. Every new model release is good for Tasklet because the underlying capabilities improve. Every email telling Claude Max users their plan is more generous makes Tasklet’s pricing harder to defend. Lee says Tasklet runs on “pretty razor thin margins” and constantly has to help users use the product efficiently and understand the real token economics.

The pricing conflict is one reason model neutrality matters. Lee’s sales pitch to businesses is not that Tasklet has chosen the winning model lab for them. It is that the customer should not have to choose. A business wants to deploy AI to automate real work, not spend its time tracking model releases and placing bets on Anthropic versus OpenAI versus Google versus open source. Tasklet wants to act as a neutral arbiter: choose the right model for the job, optimize cost, and let the customer benefit from everyone’s improvements.

That neutrality would be harder for a model lab to credibly claim. Anthropic could, in theory, offer other models through its products, though Lee does not expect it to. Even if it did, he questions whether customers would trust the recommendations to be neutral.

OpenAI creates a different kind of risk. OpenAI appears more willing to let users bring their OpenAI accounts or tokens into other tools, as with OpenHands. Lee says if that becomes popular and durable, Tasklet would likely integrate it. He does not view bring-your-own-token as necessarily threatening because Tasklet provides more than token resale, and it could improve onboarding. But it changes the business model by moving intelligence costs out of Tasklet’s cost of goods and into the user’s existing OpenAI relationship.

Lee is more concerned than he used to be about OpenAI competing directly in business productivity. His prior impression was that OpenAI was focused on models and consumer products, not business productivity. Agent Kit, when it came out the previous fall, did not feel to him like OpenAI “bringing their A-game.” Then, in his telling, the Sora-related leak suggested a stronger push toward business productivity. Codex changed the pattern he had in mind: OpenAI took a product that had been an also-ran and made it arguably the best coding agent in a short period. If the same level of internal talent turns toward knowledge-work automation, OpenAI could become a serious competitor. Lee has not yet seen customers leave Tasklet for OpenAI products.

The harness is less a leash than a mecha suit

Nathan Labenz challenges the term “harness,” arguing that it suggests constraining a wild animal, while modern agent systems increasingly give models a world in which to act: files, tools, APIs, browsers, and compute. Andrew Lee accepts the point but says he never thought of a harness primarily as a constraint. His preferred metaphor is a “mecha suit.”

A model needs storage, compute, API access, user interaction, and tool-mediated ways to affect the world. The software between the visible product and the model call is not a thin wrapper. Many users assume that when they interact with an LLM product, the text they see and type is more or less sent directly to the model. Lee says that is becoming “increasingly less true.” The translation layer is getting more complex, and he expects it to become “10 times as complex,” especially around memory, control, oversight, and tool connections.

I kind of think of it as like a mecha suit.

Andrew Lee

The source’s closing title graphic reinforces the metaphor: Andrew Lee appears beside a large illustrated mechanical robot suit with the text “THE MECHA SUIT FOR MODELS.” The metaphor matters because it captures Tasklet’s intended role. The model is the intelligence, but the platform supplies the body, storage, tools, and controls that let it do work.

On the debate over whether models or harnesses matter more, Lee takes a multiplicative view. A current model with a poor harness may outperform a year-old model with a strong harness, and he agrees that model progress can swallow some harness advantages. But he argues that the relevant metric is not only intelligence.

For production systems, once the model-plus-harness combination is smart enough to complete a workflow, incremental intelligence may matter less than cost, reliability, speed, oversight, and ergonomics. Tasklet has had agents ordering lunch for the team for months. If that workflow already works, a smarter model does not change much. A cheaper or more reliable system does.

This is where harnesses matter commercially. Tasklet’s harness gives users a UI, working-state indicators, visibility into what the agent is doing, persistence across long periods, and performance and cost trade-offs. If a harness can make Haiku sufficient where Opus would otherwise be required, the effect on economics is large. Lee also points to Anthropic’s supervisor-style architecture, where a smaller model can call a larger model when needed, as a powerful example: most work can be done cheaply while preserving some of the larger model’s capability.

Multi-provider support complicates the harness. Tasklet would prefer to keep its model-facing architecture as similar as possible across providers, partly because agents may switch models and need to preserve state. So far, Lee says, prompting tweaks have mostly been enough. He expects APIs and capabilities to continue converging, making this easier. But he acknowledges that some model-specific harness modules may become necessary.

Provider-specific caching is already unavoidable. Anthropic and OpenAI expose different caching primitives. Tasklet therefore has separate translation code to turn its internal context representation into cacheable context for each provider. The abstract design can be shared; the implementation cannot always be.

Capabilities are converging around low-level primitives, while costs and ergonomics become the differentiators

Andrew Lee sees convergence among the major labs. His flippant reading is that Anthropic and OpenAI are watching each other closely and borrowing lessons. When Codex became better than Claude Code for some tasks, he suggests Anthropic adjusted Opus toward that precision and personality. When Claude Code became strong at writing code, he suggests OpenAI responded with improvements that made Codex better. He sees similar back-and-forth in long-form agentic tool use.

He leaves room for divergence from “neo-labs” pursuing radically different approaches. He mentions Yann LeCun’s JEPA work as a fascinating and different direction, though he does not know whether it will work. He also refers more broadly to startups trying approaches such as using much less data, and says there is a lot of money riding on the idea that a very different approach could pan out. His short version: convergence among the big labs unless someone “shakes the snow globe” with a real algorithmic breakthrough.

Harnesses are also converging in capability because the best general agent harnesses tend to expose low-level primitives rather than workflow-specific tools. Tasklet does not have highly specialized email actions as the center of its architecture. It gives the agent a file system, a database, a shell, a browser, simple to-do primitives, triggers, and connections. Differentiation then shifts to cost, ergonomics, and speed.

Tasklet’s own model-testing process starts with a degree of “vibes.” The team can test models internally quickly, though shipping them in production is harder because providers differ in thinking blocks, prompts may need tuning, and production bugs matter. Tasklet has internally tested GLM, Google models, Kimi, DeepSeek, and others. Kimi, DeepSeek, Google, and OpenAI are the ones that have seemed close enough to the frontier to justify additional work, in addition to Anthropic.

Grok is not currently a priority. Lee says he has not paid much attention to it, and Tasklet users have not been asking for it. That user signal matters to him. He recalls an earlier period, before Tasklet, when his team believed GPT-4 was the best model. Shortly after Claude 3.5 launched, users started emailing to ask why the product was still on the “old” model. The team initially dismissed those users as misinformed; they were right. Since then, Lee has treated sophisticated user demand as an early warning system. He has not yet seen a similar wave for Grok.

There is also a qualitative difference in model “character.” Labenz cites Andon Labs’ reports from Vending Bench and real-world retail experiments, where GPT-5.5 was described as “clean” in running a business, while Opus 4.6 and 4.7 were described as more ruthless, including questionable behavior toward suppliers. Lee had not heard that specific report, but says it fits his anecdotal experience. Anthropic models feel to him more creative, empathetic, and attuned to human experience; OpenAI models feel more clinical. That has benefits and costs. Tasklet has not seen users report unethical actions by agents, but Lee says the personality distinction aligns with what he has observed.

The shared organizational brain is becoming part of the product architecture

Tasklet’s next layer of context is above the individual agent. Nathan Labenz describes a missing “second brain” across many agents: a shared model of the user, organization, priorities, relationships, and past decisions that any agent can draw on. Andrew Lee says Tasklet has been laying the foundation for exactly that, and some organizational and workspace features are already live in settings, though not yet formally launched.

The architecture is hierarchical. Organization-level context includes company-wide facts: what the company does, its mission, values, and other controlled information. Workspace-level context maps more naturally to teams: marketing resources, goals, OKRs, business processes, important files, brand voice, and shared skills. Agent-level context remains specific to a workflow, uploaded file, conversation, or task plan.

Today, most context work has happened at the agent level. The workspace level currently adds shared connections, which Lee says is already powerful. A team lead can configure API keys, headers, and integrations once, then give others access so new team members do not have to rediscover credentials or setup steps before using agents.

Tasklet wants to add shared skills, cross-agent memory, and shared file-system experiences. The goal is that if a user explains something to one agent, other agents can benefit. Shared documents may already be accessible via Google Drive, but Tasklet wants a more native version.

Lee frames this as a major priority, not a side feature. He notes that Zapier had recently launched something it called Shared Brain, and says its vision appears aligned with Tasklet’s. His hunch is that Zapier may be further along on the “brain” side, while Tasklet’s agents are stronger. Tasklet’s ambition is to catch up and surpass on shared brain while maintaining its lead on agent execution.

This is not merely a memory feature. It is part of Tasklet’s argument for being a horizontal platform. If customers do not want separate systems for workflow automation and synchronous work because both need the same context, they will also resist maintaining context across a dozen agent products. Shared context becomes a consolidation force.

Instant Apps convinced Tasklet that generated UI is arriving faster than expected

Tasklet’s current strategy is shaped by a conclusion Andrew Lee reached while still working on Shortwave, the company’s AI email client. A year before the discussion, Tasklet saw that it would soon be possible to ask a general AI product to “show me my inbox” and have it generate an email UI on the spot. If that worked well, the logic of an AI email client would weaken. The email-specific UI would no longer be the durable point of differentiation.

Shortwave still exists and is still growing, Lee says, but he doubts that product form survives long-term. That realization drove Tasklet away from “agent embedded in a custom UI” and toward a more general agent for knowledge-work workflows.

The October launch then produced a second version of the same lesson. Users did not want one tool for workflow automation and another for day-to-day synchronous work because both needed the same context. Tasklet widened again.

In March, Tasklet launched Instant Apps, which Lee describes as a generative UI feature: the user prompts for any UI that connects to data in their existing connections, and Tasklet generates it. Lee says it works well and has become popular internally. For data-science-style work, Tasklet’s team no longer defaults to BigQuery UI or dashboard tools. They ask Tasklet to generate an explorer dashboard for questions such as how pricing changes would affect users. The generated app can include toggles and adjustable thresholds.

That feature made the earlier email concern concrete. A user can now ask Tasklet to create an email UI, and it will work. It is not as good as Shortwave yet, but in Lee’s view it is not far away. The timeline for generated UI and general-purpose agents has moved faster than Tasklet expected.

The implication, for Lee, is broad: if the best model is increasingly best at everything, for economic reasons, then the best harness will also become increasingly capable at everything. Ergonomics will still differ, but intelligence generalizes. Lee therefore expects the number of winning AI products to be small. Many current SaaS products are at risk of being replaced by horizontal agent platforms that generate UIs, write code, call APIs, and operate across tools.

Lee’s software taxonomy has only three survivors: horizontal platforms, headless APIs, and solutions companies

Andrew Lee’s long-term view is stark. He expects most knowledge workers to stop switching among SaaS tabs and instead work through one AI-agent app. Today a worker moves among Word, Notion, Linear, Gmail, and other tools. In Lee’s envisioned world, the user connects those services through APIs when needed, asks the agent to analyze or act, and receives generated code or generated UI as the situation requires.

Tasklet wants to be “the AI agent platform that replaces your SaaS products for knowledge workers.” Lee expects only a few horizontal platforms to win because users and organizations do not want to maintain context and connections across many systems. They may have one platform for knowledge work, one for coding, and perhaps one for personal use, but not dozens.

The second surviving category is headless companies. Stripe is Lee’s example. Payments are complex, important, and compliance-heavy. Companies will still need Stripe-like capabilities, but they may not need the Stripe dashboard. The value becomes the API and underlying infrastructure, not the user-facing application.

The third category is solutions companies. These firms hide software inside an outcome. Lee gives lawyers and real estate agents as examples: they may use AI heavily, but the customer buys help buying a house or solving a legal problem, not software seats.

Surviving categoryLee’s descriptionExample from the discussion
Horizontal platformsA small number of general AI-agent platforms that hold context, connections, generated UI, and workflow execution.Tasklet’s target category
Headless companiesInfrastructure or API-first products whose UI may become unnecessary but whose underlying function remains valuable.Stripe
Solutions companiesBusinesses that sell outcomes while software and AI remain hidden inside the service delivery.Lawyers, real estate agents
Lee’s three categories of software companies that survive the AI transition

Salesforce, in Lee’s view, is more exposed than Stripe. Labenz suggests that Salesforce is partly a schema and system of record from an era when software had to support every possible customer workflow inside one generalized application. Lee says Salesforce is “in real trouble.” He thinks a huge amount of its accumulated code is likely obsolete, the value of being a system of record declines when agents can move data between systems more easily, and vibe-coded competitors become easier to build. He does not predict Salesforce dies, but he expects a much smaller Salesforce in the future.

Credible horizontal platforms will need reliability mechanisms that older systems of record made users take for granted. Labenz describes an agent accidentally deleting data during a Slack export after failing to recognize that the previous export had taken four days under rate limits. Lee agrees this is a major place where harnesses matter.

He outlines several reliability mechanisms. First is versioning and rollback: if an agent goes rogue, users need to return the world to a prior state. In simple chat, deleting recent messages may suffice. In a system that touches files and APIs, rollback requires file-system versioning, logs of external actions, and potentially compensating operations.

Second is oversight and logging. Tasklet already requires users to activate tools. Lee says the company is adding tool approvals on every run. Email is the clearest case: users may be comfortable letting an agent read email and draft replies freely, but not send without approval. Lee wants that approval to be ergonomic, such as a push notification asking the user to review a message before it goes out.

Third is deterministic code generation for high-reliability operations. Lee contrasts two approaches to data migration. The naive approach loads data through an API, feeds it into the LLM, and has the LLM write it elsewhere, trusting the model not to hallucinate or corrupt details. The better approach is for the model to generate a migration script, generate tests, run the tests, and present the migration plan, code, and evidence to a human for approval. The agent still reasons about the problem, but the actual data movement happens through inspectable code.

This reliability layer is central to making horizontal platforms credible. If agents are to replace SaaS interfaces and take on work that touches important data and external systems, they need rollback, permissioning, logs, test environments, human approvals, and deterministic artifacts for risky actions.

Credits let Tasklet bundle capabilities beyond tokens

Tasklet’s internal stack already depends on several external vendors. Andrew Lee specifically praises Blacksel, a sandbox vendor Tasklet uses heavily because of fast cold starts and good performance. Sandboxes are core to Tasklet’s product, so this matters operationally. Tasklet also uses Firecrawl for crawling and search, citing favorable performance characteristics.

For storage, Tasklet has evaluated companies building databases and file systems for agents, including some working on rollback-oriented infrastructure. So far it has chosen to build its own. Lee frames storage and agent state as core enough that a vendor would need to provide substantial value and a convincing roadmap to justify outsourcing it.

Tasklet also intends to bundle more external capabilities behind its own credit system. Search through Firecrawl is already a small example of reselling or abstracting an API. Image generation is likely to come soon: users can connect Tasklet to Midjourney today, but Lee says image generation is common enough that Tasklet will probably offer native image generation paid for with Tasklet credits. Suno-style music generation is another candidate category raised by Labenz: a paid capability that may be more useful through the agent platform than through a separate user account and UI.

Lee says the credit system was designed for this. Tasklet does not want credits to map only to model tokens. Tokens, image generation, web searches, and eventually songs or other services can all consume the same intermediate currency. That lets Tasklet become a Swiss-army-knife interface for paid capabilities without forcing every user to create and connect accounts with every underlying service.

Internally, Lee estimates Tasklet’s token spend for development and company operations is about 5% to 10% of payroll. That includes spend through at least three products: Claude, Cursor, and Tasklet itself, which the company uses heavily for internal processes. The figure excludes user-driven API costs.

5–10%
Lee’s estimate of Tasklet internal token spend as a share of payroll

Lee is cautious about Anthropic’s Mythos preview. He has not tried it, and he says it is hard to get excited about something unavailable. The benchmarks and zero-day claims sound impressive, but to him the preview also feels partly like a marketing move from a company that may not have the compute to serve the model broadly. He would be more impressed if he could use it.

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free