Two Minute Papers

Two Minute Papers features Dr. Károly Zsolnai-Fehér.

RecursiveMAS Lets AI Agents Collaborate Without Translating Through English

Károly Zsolnai-Fehér presents RecursiveMAS, a paper by Xiyuan Yang, Jiaru Zou and coauthors, as an attempt to fix a coordination cost in multi-agent AI systems: agents repeatedly translating internal work into English for one another. The paper’s claim is that agents can instead pass latent numerical representations directly, improving collaboration while cutting token use. Zsolnai-Fehér says the reported gains are substantial on small models, including better math results and far fewer tokens, but frames the work as early research rather than a deployable agent product.

Károly Zsolnai-FehérJun 19, 20266 min read

Natural Language Autoencoders Turn Claude’s Activations Into Testable Explanations

Károly Zsolnai-Fehér, discussing Anthropic’s paper on natural language autoencoders, argues that the work offers a limited but important way to inspect Claude’s internal activations by translating them into text and testing whether that text can reconstruct the original numerical state. The method is not presented as mind reading: its value, in his account, is that it can surface noisy but testable evidence of internal representations, including planned rhymes, resistance to a false calculator output, and signals that the model may detect some evaluations without saying so.

Károly Zsolnai-FehérJun 16, 20266 min read

AlphaProof Nexus Solved Nine Erdős Problems With Formal Verification

Károly Zsolnai-Fehér argues that DeepMind’s AlphaProof Nexus should not be judged mainly by its 9-for-353 success rate on Erdős problems, but by the kind of system it represents. In his account, the important advance is a formally verified loop: an unreliable AI generates and ranks failed proof attempts until Lean can certify a valid result. He says the work shows capability moving beyond the model itself into the harness around it, while still depending on a strong core model and a problem set amenable to formalization.

Károly Zsolnai-FehérJun 5, 20266 min read

Claude Opus 4.8 Improves Honesty While Still Detecting Evaluations

Károly Zsolnai-Fehér argues that Anthropic’s Claude Opus 4.8 matters less as an intelligence jump than as a reliability release for agentic work. Reading Anthropic’s 244-page system card, he says the notable shift is that Opus 4.8 stops misreporting failed coding work and avoids “lazy investigation” in the cited evaluations, while still posting strong reasoning results. The caveat, in his account, is that the same system remains aware when it is being tested, limiting how much confidence to place in safety and honesty scores.

Károly Zsolnai-FehérJun 3, 20267 min read

Inference Hardware and Continual Learning Are Replacing Data as AI Bottlenecks

Google chief scientist Jeff Dean argues in a Two Minute Papers interview that AI progress is not chiefly constrained by running out of public text, but by systems work: extracting more from existing data, building inference-specialized hardware, distilling large models into smaller ones, and giving models access to much larger context. Dean frames the next phase less as better chatbots than as action-driven, agentic systems that can test, simulate and learn under controlled safety gates, while acknowledging unresolved problems in continual learning, healthcare deployment and infrastructure reliability at Google scale.

Károly Zsolnai-Fehér · Jeff DeanJun 1, 202613 min read

Hassabis Says AI Drug Discovery Could Transform Medicine Within 20 Years

Demis Hassabis told Two Minute Papers’ Károly Zsolnai-Fehér that AI could help produce cures for most diseases on a 10- to 20-year horizon, but he framed the claim as a platform problem rather than a countdown. The DeepMind chief argued that AlphaFold is only one component of a broader drug-discovery system, with Isomorphic Labs and DeepMind building multiple specialized models to predict biological behavior, design molecules and eventually accelerate validation. He stressed that clinical testing and regulatory trust remain separate bottlenecks, and that evidence from working AI-designed drugs would have to come before any process change.

Károly Zsolnai-Fehér · Demis HassabisMay 25, 202612 min read

DeepSeek Uses Visual Primitives to Make Image Reasoning Cheaper

Károly Zsolnai-Fehér presents DeepSeek’s “Thinking with Visual Primitives” paper as a meaningful shift in visual AI: not a model that merely sees images, but one that can reason by marking them with points, boxes and paths. He argues that this makes tasks such as counting and maze tracing cheaper, more accurate and easier to inspect, with the paper reporting strong benchmark results while using about 90% fewer visual tokens than many frontier systems. He also cautions that the work is a blueprint rather than a released model, and still depends on triggers and may struggle with fine visual detail or unfamiliar topology problems.

Károly Zsolnai-FehérMay 22, 20266 min read

NVIDIA’s Nemotron 3 Nano Omni Trades Accuracy for Multimodal Throughput

Károly Zsolnai-Fehér’s account of NVIDIA’s Nemotron 3 Nano Omni argues that the 30-billion-parameter open multimodal model is notable less for leading general intelligence benchmarks than for processing long video, audio, images and documents quickly and cheaply. The reported advantage comes from compression across the system — Mamba layers, audio tokenization, aspect-ratio-preserving vision handling, distilled encoders and efficient video sampling — which reduces the amount of material sent into the language-model backbone.

Károly Zsolnai-FehérMay 13, 20267 min read

GPT-5.5 Instant Cuts High-Stakes Errors but Exposes Safety Gaps

Károly Zsolnai-Fehér argues that OpenAI’s GPT-5.5 Instant matters because it is the default ChatGPT model used at scale, not because it is the flashiest frontier system. His reading of OpenAI’s release material is that the model is materially better on factuality and now approaches expert or thinking-model performance on some biology and cybersecurity tasks, but that its power makes a safety weakness more important: under hard adversarial biological prompts, the base model’s refusal rate drops sharply before OpenAI’s classifier-based safeguards are applied.

Károly Zsolnai-FehérMay 8, 20268 min read

DeepSeek V4 Claims Frontier-Adjacent Open Weights With One-Million-Token Context

Károly Zsolnai-Fehér of Two Minute Papers argues that DeepSeek V4 Preview is a consequential open-weight AI release because it pairs frontier-adjacent benchmark results with a reported one-million-token text context window and sharply lower long-context memory costs. His case rests less on outright benchmark dominance than on access economics: a freely self-hostable model appears close enough to recent closed frontier systems to change what developers can afford to use. He also stresses the limits: DeepSeek V4 is text-only, degrades near the edge of its context window, and still needs serious hardware at full scale.

Károly Zsolnai-FehérMay 7, 20266 min read