Brilliant’s Koji Uses AI to Make Students Solve Problems Themselves

Sue KhimThis Week in StartupsMonday, June 8, 202617 min read

Brilliant founder Sue Khim tells This Week in Startups that the company’s new AI tutor, Koji, is built to counter the education use case parents fear most: software that gives students answers while eroding their ability to think. Khim argues the opportunity is not generic AI in the classroom, but a constrained tutor embedded in Brilliant’s lessons that uses Socratic prompting, visual scaffolding, and assessment to help students solve problems themselves. Jason Calacanis frames the same idea more broadly, saying AI is useful when it strengthens the person doing the work rather than replacing the work.

Brilliant’s AI wager is that tutoring should make students do more work, not less

Sue Khim describes Brilliant’s new AI tutor, Koji, as a direct response to the version of AI that many parents fear in education: tools that complete the work while weakening the learner. Her argument is not that consumers have rejected AI. It is that they have rejected a particular use of it.

“People aren’t anti-AI,” Khim says. “They’re anti-idiocracy, they’re anti-IP theft, they’re anti-being replaced in their jobs, they’re anti-slop.” The launch signal she takes from Koji’s viral debut is narrower and more useful: “AI that makes you think” is popular with parents because it promises help without outsourcing cognition.

Brilliant launched Koji into what Khim describes as an unfriendly consumer environment for AI in education. She says commencement speakers talking about AI were being booed, AI had become “immensely uncool and unpopular with consumers,” and parents were worried both about declining academic basics and about AI making the problem worse. In Khim’s telling, American parents are seeing evidence that children “can’t read or do math anymore,” and they no longer assume school will reliably teach reading, writing, and arithmetic.

Koji is meant to meet that anxiety by refusing to act like an answer machine. Brilliant’s promise is a tutor that guides, probes, and eventually disappears. Khim frames the product around a distinction she has spent years building toward: school math often trains formulas and procedures; Brilliant tries to train problem solving. Procedure, in her view, can carry a student through familiar test questions and then fail when the situation changes. Problem solving is more transferable because it teaches students to see the structure of a problem.

“Knowledge versus problem solving” is the recurring distinction. Khim says computation by memorized procedure is “brittle,” and that math and coding are useful not only because they teach content, but because they train “systems brains.” Students who learn that way develop intuition about what kinds of problems can be solved, what is easy, what is impossible, and how to get started.

That is also why Khim rejects the idea that AI makes Brilliant obsolete. She argues AI creates a fork: it can make students “passive and dumb,” or it can make them stronger thinkers.

AI clearly is going to either make you smarter or it's going to make you passive and dumb. And everyone's gonna have to make this choice.

Sue Khim · Source

Jason Calacanis puts the same point in cultural terms. He argues that recent student hostility to AI may come from resentment at a school environment where students use large language models to get through assignments, professors use AI to generate lessons, and everyone feels trapped in “university slop” before moving into “job slop.” What he likes about Brilliant’s framing is that it tells students they can become stronger systems thinkers with AI as a tool, rather than treating AI as a replacement for their effort.

Khim’s concise version is that Brilliant wants “a world-class tutor in every home.” Later she expands it: a tutor in every home, in every language, in every subject.

Koji is constrained by lesson design, not trusted to invent the pedagogy

Koji’s product design is not a generic chatbot attached to a learning app. Khim repeatedly emphasizes that the large language model has “a very constrained role.” The mathematical correctness, pedagogy, and interactive lesson structure live in Brilliant’s own lesson infrastructure, which she says is more deterministic and has been developed over years.

The demo makes that architecture visible. Koji appears as a chat panel on the left side of a Brilliant lesson. The right side contains an interactive algebra problem: an area model with rectangular tiles labeled y², y, and 1. The prompt asks the learner to complete the expression “(3y+2)(y+1) = 3y² + __ + __.” Beneath the expression are selectable answer tiles including 2y, 3y, 5y, 2, and 5. Koji’s panel tells the student: “Ask me for clarification, concept reviews, help with this problem, and more.”

The interface matters because Koji is operating in relation to a structured, visual problem state. The tutor is not responding to a standalone text prompt; the tile model, the algebraic expression, the answer choices, and the learner’s partial reasoning are all part of the environment Khim says Koji can observe and act on. The screenshot shown in the source presents Koji beside the lesson rather than above it: tutoring is embedded directly into the problem space.

Khim role-plays as a learner. She notices the y² tiles and asks whether seeing “three Ys” is correct. Koji does not simply give the answer. It responds by pointing to the three y-tiles in the bottom-left and asks: “If you look at the whole rectangle, how many separate regions contain y-tiles?” When Khim says she is not sure what “regions” means, Koji backs up: “Let’s start with what we know. The three y squared term comes from multiplying three y and y. Can you fill in the two linear terms?”

The interaction pattern is the product. Koji can respond to confusion, redirect the student toward the relevant structure of the visual model, and keep the student doing the work. When Khim later says she does not understand how the graphic corresponds to the equations below it, Koji explains that the area model and algebra represent the same thing, and that two separate regions with z-tiles become a single combined term in algebra.

Khim highlights four design choices from the demo. First, Koji guides the student “to understanding” rather than dumping long explanations. Second, the conversations are meant to be interactive, with students doing “a lot more of the work.” Third, Koji can see, draw, and annotate on the screen, approximating a tutor sitting next to the learner and sketching directly on the page. Fourth, visual scaffolding fades over the course of the lesson.

That last point matters to the pedagogy. Brilliant wants Koji present while a concept is being learned, then absent when the student must demonstrate mastery. By the end, Khim says, the student is in a “test-like environment” with just the problem, no help, and no Koji.

We want to get you to the point where the tutor is there for teaching you the concept, but then disappears when it's time for you to demonstrate that you can do it by yourself.

Sue Khim · Source

Asked whether Koji is simply Claude, OpenAI, Grok, Gemini, DeepSeek, or another model placed inside Brilliant, Khim says Brilliant is “fairly model agnostic.” The company benefits from model advances in natural conversation, latency, general intelligence, action use, and localization. But she argues frontier models do not have the reward signals needed to become excellent tutors on their own.

Her claim is specific: in Brilliant’s benchmarking, frontier models’ ability to tutor well has not improved much since “O1.” They have improved at following instructions and using actions, but the “core job of tutoring” — diagnosing and fixing student misunderstandings — has plateaued. Khim’s explanation is that tutoring requires real, verifiable reward signals for reinforcement learning. Brilliant’s advantage, she says, is that it has actual learning loops: it can observe whether a student comes to understand.

That is why Koji was not produced by asking a model to infer a tutoring method from raw examples. Khim says Brilliant had to teach the methodology concept by concept. Expert teachers instructed the model on which tool calls to make, what misconceptions to expect, and how to help students in particular situations. The work included not only model behavior, but the entire UI and lesson experience.

“It was a huge schlep,” she says. “There’s very little of this that we get for free from LLMs.”

The learning architecture, in her view, has to be vertically purpose-built. Khim says Brilliant has been preparing for this kind of product since 2019, “literally since GPT-2,” by building an interactive canvas infrastructure where each canvas has an API that LLMs can read and write to. That lets the tutor interact graphically with the page, observe what the student is doing, infer intent, and provide pedagogy anchored in the lesson.

This is the difference between AI as a generic layer and AI as a controlled participant in a designed learning environment. Khim does not claim frontier labs cannot improve models. She says the part Brilliant needs — real-time personalization based on a dense user model and learning graph, tied to verifiable learning outcomes — is unlikely to “come out of a model company anytime soon.”

The business is priced against tutoring because that is the problem parents already pay to solve

Brilliant’s pricing only seems expensive if compared with consumer apps. Khim says the company does not benchmark itself to games. It benchmarks itself to tutors.

Calacanis presses on the point because Brilliant’s price range — he cites roughly $20, $30, or $40 per month depending on plan — is high against typical app-store subscriptions that he characterizes as often costing $60 to $100 per year. Khim responds that the main pricing question from customers is not why Brilliant is expensive, but “why is this so cheap?”

The comparison set is human tutoring. Calacanis says that in Silicon Valley and Austin, he has paid for tutors in areas such as chess, math, and music, with sessions costing roughly $150 to $300 for about 90 minutes. Khim says a middle-America tutor might be about $80 an hour if not “super high-end,” but that effective tutoring often requires high dosage — perhaps three times a week. For families with more than one child, she says, the math becomes impossible.

Reference point	Amount described	Context
Brilliant / Koji plan	$20–$40 per month	Calacanis’s description of plan pricing
Middle-America tutor	About $80 per hour	Khim’s estimate for a non-high-end tutor
High-cost tutoring session	$150–$300 per 90 minutes	Calacanis’s experience in Silicon Valley and Austin
Annual tutor benchmark	$10,000 per year	Khim’s framing of what parents understand a tutor can cost

Brilliant’s pricing discussion compares the product with human tutoring, not casual app subscriptions.

Khim says parents understand that a tutor can cost $10,000 a year. Brilliant’s aim was to reduce that cost by 95% and offer, for around $30 a month, a product that can do “most of the things” a parent would hire an in-home tutor to do, with less cost and less hassle.

The scheduling difference is part of the value proposition. Calacanis notes that children are not always ready to learn when a tutor is scheduled to arrive: they may be tired, distracted, in a bad mood, or missing something with friends. Khim agrees that scheduling is a real issue. An always-available tutor can meet the student when the student has energy and attention.

That same logic helps explain why Brilliant stayed consumer-first rather than selling primarily into schools. Khim says the company has always wanted to be “as close to the metal” of the user’s problem as possible. Selling to districts puts multiple layers between the company and the learner: district, teacher, classroom assignment, student. In a classroom, students may be forced to use software whether it is inspiring or not. That can give companies permission to build products students hate.

Khim argues a scaled consumer app produces a better feedback loop because people solve millions of problems on Brilliant every day, and the company can observe whether it is improving at teaching topics in a way students voluntarily continue using. The App Store reviews, she says, are “gold.” Multiple people at Brilliant still read every review. Parents send emails, feature requests, photos, and videos explaining what is and is not working for their children.

The consumer model aligns product pressure with the learner rather than the buyer in a procurement process. Khim says parents are “very happy” to tell Brilliant exactly what they want, and the volume of feedback is difficult to match in a business-to-business education model.

Calacanis frames this as avoiding a B2B2C trap: putting an institutional buyer between the company and the consumer. His version of the lesson is that app-store ratings, refund requests, churn, trials, engagement, and support complaints create a granular signal that helps produce not just product-market fit, but “market pull.”

Brilliant’s origin story explains why Khim keeps returning to the underlying problem

Before Brilliant, Khim and her co-founders built Alltuition, which she describes as “three nerdy kids out of Chicago trying to fix student loans.” The original idea was a LendingTree- or Bankrate-like comparison product for private student loans. Student loans, she says, are the second-largest loan many people take out in their lifetimes, yet students had no easy way to shop for them. Alltuition read through thousands of pages of lender documents, rationalized hidden terms and fees, and built a system for comparing rates.

The product found some traction. Khim says that at its peak, Alltuition was processing about $100 million in student loan consolidations and helping borrowers get cheaper rates. It also attracted cease-and-desist letters from major lenders who did not want that comparison layer to exist.

$100M

student loan consolidations Alltuition was processing at peak, according to Khim

The pivot to Brilliant came from a deeper objection to the business model. Khim credits Chamath — described by Calacanis as an early judge and investor connected to the company’s history — with pushing the team away from a model that could become profitable while preserving the system it was meant to simplify for consumers.

According to Khim, Chamath looked at Alltuition and warned that the company could end up like Intuit or H&R Block: financially incentivized to keep an underlying system complicated so it could profit from helping consumers navigate it. He believed Alltuition could become a good business and make money, but would not actually solve student debt. His view, as Khim recounts it, was that student debt “needs to disappear,” and helping people get cheaper rates would not make the problem go away.

The question that led to Brilliant was broader: if the team could build anything in the world to make the biggest difference possible, what would it be? Brilliant emerged from that first-principles exercise, while Alltuition continued to support the loans it was already managing.

That origin explains part of Khim’s emphasis on problem solving rather than math content alone. The company began, in her phrase, as “the world’s largest online math club,” treating math as a conduit for becoming a stronger thinker and problem solver.

She cites parent stories as evidence of the product’s effect: children who were not interested in math becoming absorbed, skipping a grade, or working beyond their grade level. One example she gives involves Vineeta and Parag, with Parag identified as the former Twitter CEO before Elon Musk’s takeover. Khim says Parag reached out because his seven-year-old was doing fifth-grade math on Brilliant and was “obsessed” with the app. After entering the Koji beta, the child reportedly told his parents he now had a personal tutor and could ask Koji questions when they were busy.

Khim’s principle behind these anecdotes is that children can do hard things if adults “call them to a high place and set high standards.”

The expansion plan is broad, but math and coding mastery remain the proof

Brilliant’s stated vision is expansive: a tutor in every home, in every language, in every subject. But Khim’s near-term roadmap remains anchored in math and coding before moving outward.

She says Brilliant currently covers math and coding, is getting close to curriculum-complete in middle- and high-school math, is tackling college-level material, and is also going younger. In coding, the company is working toward completion of foundational coding this year. Next year, she says, Brilliant plans to expand into science and more technical topics.

Some parts of the global vision benefit directly from model progress. Khim says only about 40% of Brilliant’s users are in the U.S.; the remaining 60% are around the world. Koji’s name was chosen partly for global accessibility: short, easy to say, easy to remember. Localization, she says, is “astonishingly good,” and the product’s tutor voices include employee voices. She describes hearing people she works with teach algebra in “perfect Korean” as uncanny.

But the breadth of the subject vision depends on making the tutor work reliably enough that expansion does not degrade into generic explanations. Khim says the company has a “really unique dataset”: tutoring sessions at scale with real students. She characterizes the current moment as “the ChatGPT moment for AI and learning” and expects that training and fine-tuning models specifically for tutoring is near.

Her argument is that Brilliant’s data may become valuable because it contains learning interactions that reveal whether tutoring actually worked. Calacanis raises the possibility that vendors training frontier models, or the model companies themselves, might want to plug Brilliant’s tutoring data or methods into their own services. Khim answers more generally that there is “deep recognition” of the dataset’s uniqueness.

The company itself remains modest in disclosed operating details. Khim says Brilliant has about 70 employees. She does not disclose revenue or membership numbers, though she suggests the company may hit milestones soon that it might share publicly. Calacanis, as a long-time investor, says he would like to see a number with “two commas and almost a third,” but Khim does not specify whether such a milestone would be revenue, users, or something else.

The strategic lesson Calacanis draws for founders is that AI can be “jet fuel” for companies that already understand their customers and have built real infrastructure. In his framing, Brilliant is not being disrupted by AI because it can integrate AI into an already differentiated product and become the disruptor itself. He speculates that if new subject verticals once took ten years to build, AI might compress that to two years or ten months. Khim agrees with the direction, while keeping the specifics of expansion tied to the current roadmap.

Calacanis’s VC rant reduces to process design around respect

The second major thread is Jason Calacanis’s reaction to a wave of “VCs behaving badly” stories circulating on X. He treats the viral posts as a familiar release valve for founder grievances about fundraising: the process can be weird, the power dynamic can be weird, and the founder often bears the cost of pretending otherwise.

The immediate example is a Greg Isenberg post about pitching a $15 million Series A to a top-three venture firm while one general partner slept through the meeting for more than 30 minutes. The visible post says no one acknowledged it, everyone kept going, and the founder kept presenting “to an unconscious man in a Herman Miller chair.” Isenberg’s conclusion in the post is that founders are not crazy for thinking the fundraising process is weird.

Calacanis uses the story to distinguish rude investor behavior from unavoidable human circumstances. He recounts raising for Mahalo after selling Weblogs Inc. to AOL for $30 million roughly 18 months after starting it. He emailed Mark Cuban, Michael Moritz, and John Doerr with a short pitch for a $3 million raise for a human-powered search engine. Moritz, he says, called both phone numbers in his email signature within an hour and had him in the office two days later. Moritz introduced him to Roelof Botha, and Sequoia funded the company.

John Doerr also took the meeting, but Calacanis says Doerr nodded off. The explanation came afterward: Doerr had flipped his bicycle in Woodside that morning, gone to the emergency room, come directly to the meeting with a sling and injuries, and needed to return for an X-ray. Calacanis treats that as a compliment rather than an insult. Doerr should not have taken the meeting, he says, but the fact that he came anyway signaled commitment.

The counterexample is a meeting Calacanis says was canceled while he was already traveling from Los Angeles to Sand Hill Road. He names Mohr Davidow Ventures and says a partner had pushed through a mutual contact for the meeting, then left messages canceling it because the partners were not aligned on valuation. Calacanis describes getting up at 5 a.m., flying to San Francisco, renting a car, and hearing the cancellation only while driving down Highway 101. He went to the office anyway and confronted the partner. The insult, in his telling, was not passing on the company; it was wasting the founder’s time after requesting the meeting.

The useful part of Calacanis’s monologue is that he says LAUNCH tries to design against those behaviors. Every first meeting triggers an automated feedback email from him asking the founder to rate the interaction. Low scores go into a “founder feedback tomatoes” channel; high scores go into a kudos channel. The firm stack-ranks team members by founder feedback, and only the three highest-scored people do first calls.

One repeated complaint was that investors did not understand the founder’s business. Calacanis says founders made that complaint about him as well. His response was to add a closing question to meetings: “May I repeat your vision back to you so that I can make sure I understand it perfectly?” He says the question proves attention, gives the founder a chance to add something they missed, and surfaces misunderstandings before the meeting ends. He then required everyone doing first meetings at LAUNCH to do the same.

His preferred first meeting format is explicit: 20 minutes on Zoom, with 10 minutes for the founder to present, five minutes for two or three investor questions, and five minutes for the founder to ask about the firm. Calacanis says the structure is meant to show respect for founders’ time, ensure the firm has the necessary information, and make clear how LAUNCH invests.

The same respect logic carries into deal structure. Venture capital, as Calacanis describes it, depends on mutual leaps of faith: investors give founders large amounts of capital and trust them to report honestly; founders trust investors not to block future financings or sales. That is why he says he is “a huge fan of standards.” He urges founders to avoid non-traditional terms that give investors control out of proportion to ownership, such as excessive board control. His rough rule is that a 5% owner might receive a board observer seat, while an investor above 10% might receive a board seat.

This connects back to the founder lesson he extracts from Brilliant. Investors may struggle with companies that do not fit an existing total addressable market category. He says an “online tutor” may not look legible to investors who want a pre-existing TAM. His answer is to listen to customers and build something that delights them. For Brilliant, Khim’s account is that this meant staying close to learners and parents, pricing against the problem parents actually pay to solve, and using AI only where it strengthens the learning loop. For LAUNCH, Calacanis says it means treating founders’ time and attention as scarce, not merely the investor’s capital.

AI Application Architecture Data and Training AI in Education and Learning Human-AI Interaction AI Product Management