Orply.

Claude Code Reverse Engineers Viking VoIP Phone’s Undocumented Configuration Protocol

Boris StarkovAI EngineerFriday, May 29, 202611 min read

Boris Starkov of ElevenLabs presents the Viking K-1900D-IP phone as a reverse-engineering case study in which Claude Code turned an unusable, undocumented VoIP handset into a working AI demo. Starkov argues that Claude did the investigative work: discovering a two-letter command protocol, brute-forcing valid registers, intercepting the manufacturer’s Windows XP-era software through a TCP proxy, and deriving the one-byte checksum needed to write persistent configuration. His account is also a claim about agency in hardware work: he says he acted largely as Claude’s hands while Claude orchestrated the protocol break.

The hard part was not making the phone call. It was making the phone remember how

Boris Starkov wanted a red industrial hotline phone to behave like a simple demo object: pick up the handset, reach an AI voice agent, answer five questions about British AI history, and claim swag at the ElevenLabs booth. The finished installation was a red British-style phone booth on the third floor, with a Viking K-1900D-IP handset inside. Lifting the receiver placed a call to a Michael Caine voice agent — Starkov emphasized the voice was approved and legal — which then quizzed the caller.

The telephony path was not the difficult part. The target architecture was straightforward: handset lift on the Viking phone, SIP to a Twilio SIP domain, Twilio routing via HTTP, and then an ElevenLabs Conversational AI agent that answered and spoke. Starkov described that portion as “super easy thanks to ElevenLabs” and Twilio.

The obstacle was older industrial hardware with almost no modern configuration surface. The Viking K-1900D-IP had no screen, no buttons, no web UI, no REST API, no CLI, and, in Starkov’s telling, no protocol documentation he could find. Its supported configuration route was proprietary Windows software, compatible with Windows XP. Starkov had a Mac. At ElevenLabs, he said, nobody had a Windows laptop readily available, and attempts to use a virtual environment ran into driver issues.

The phone had already defeated a prior attempt. Starkov said the ElevenLabs San Francisco team bought it a year earlier for another event, and that three senior software engineers, using ChatGPT at the time, could not get it working. It sat unused until the team sent it to London for the summit demo.

Starkov repeatedly framed Claude Code as the driver of the reverse-engineering process and himself as the operator: connecting cables, rebooting the phone, listening for beeps, running commands, and reporting observations back to Claude.

Without Claude Code it wouldn't be possible to do that demo. It's not just it made it 10 times faster, it just made it possible because I'm not a security engineer whatsoever, I'm actually like just a normal software engineer.

Boris Starkov · Source

Claude first found an undocumented text protocol, then exhausted it

The physical setup was simple enough to begin probing. Starkov connected the phone over Power over Ethernet through a router, connected his laptop to the same router over Wi-Fi, and let Claude inspect the network. An Nmap scan first found port 10001 open, labelled scp-config, but that turned out to be a red herring: a Lantronix tunnel that accepted connections and reset HTTP requests. A later scan of port 107 found the real target, shown as open rtelnet.

Claude then tested whether the port would speak. Sending VIKING\r\n to port 107 with nc produced a response from what identified itself as Arcturus Socket Daemon 2.0 BUILD R8.44.2236 USE CASE VIK02, followed by status-looking fields and ER[VIKING]. Starkov said Claude interpreted ER[...] as an error response echoing back the invalid input: evidence that there was a protocol, even if it was not “googlable,” as he put it.

The first layer turned out to be a simple text protocol built around two-letter command registers. Reads used a two-letter command followed by carriage return and newline; writes used the same two-letter prefix followed by a value. Some examples shown during the demonstration included SC -> SC[NOT_CONNECTED], WA -> WA[192.168.8.235], UCibob -> UC[bob], and MBI -> MBI[A8.64.2216].

Once Claude inferred two-letter command codes, brute force was cheap. It wrote a shell loop over AA through ZZ, sent each candidate to the phone, and printed responses that were not errors. Out of 676 possible two-letter combinations, Starkov said more than 80 returned meaningful responses.

80+
valid two-letter registers found by brute force

The discovered register map covered recognizable categories: SIP settings, status, network configuration, audio, device information, and control commands. The SIP registers included UC, UD, UP, UR, UU, and UX for fields including username, domain, password, and registrar; status registers included SA, SC, SR, and SP; network registers included WA through WF and WM; and control-looking registers included CE, GB, ME, and MR.

CategoryRegisters shownExamples shown
SIPUC, UD, UP, UR, UU, UXusername, domain, password, registrar
StatusSA, SC, SR, SPNOT_CONNECTED, NOT_REGISTERED
NetworkWA, WB, WC, WD, WE, WF, WMIP, subnet, gateway, MAC
AudioVA, VHspeaker volume, handset volume in hex
DeviceMA, MB, MC, MN, MXdate, firmware, codec, name
ControlCE, GB, ME, MRconfig commit, save, reboot
The two-letter command space exposed status, configuration, device, and control registers.

The next step appeared to work. Claude used the text protocol to set SIP credentials: username, domain, password, and registrar. The phone accepted the writes, returning values such as UC[vikingphone], UD[sip.twilio.com], UP[s3cret], and UR[sip.twilio.com].

Then the core failure emerged. After rebooting with MR=1, the values were gone. Reads returned empty SIP fields, and registration status remained NOT_REGISTERED. Starkov summarized the conclusion as a split between volatile and persistent state: the text protocol wrote to RAM, while the SIP stack read from flash. The question was how to bridge them.

Claude tried to find a save path through the discovered control commands. Attempts included WR=1, CE, CS, CU, GB=40, and ME=1. Some returned OK, but none made the SIP settings survive a reboot when used with the text protocol alone. Starkov said Claude also tried three-letter commands and “all reasonable words,” but found nothing.

This was the point at which the earlier San Francisco effort had stopped. Starkov presented the dead end as a mismatch: the text protocol could read and write RAM; the SIP stack read from flash; the missing bridge was unknown.

The official Windows software became the oracle

Claude’s next plan was not to keep guessing commands. It was to observe the proprietary software that already knew how to configure the phone.

That required running the Viking Windows software in a Windows VM on macOS. But the VM could not reach the phone directly because, as Starkov described it, Wi-Fi could not be bridged into the VM on macOS in the setup he was using. Claude’s workaround was a TCP proxy running on the Mac. The Windows VM connected to the proxy on port 10107; the proxy connected onward to the Viking phone on port 107; and the proxy logged every byte in both directions.

ComponentRole in the capture
Windows VMRan the proprietary Viking configuration software
Mac TCP proxy on :10107Received traffic from the VM, relayed it, and logged every byte
Viking phone on :107Received the same protocol traffic the official software would normally send directly
The proxy arrangement let Starkov and Claude observe the official software without letting it talk to the phone invisibly.

The proxy was small — Starkov described it as 15 lines of Python. It created a connection to 192.168.8.235:107, bound a server socket to 0.0.0.0:10107, relayed both directions, and produced hexdumps.

This turned the proprietary software into a protocol trace generator. When the Viking software wrote a SIP server such as sip.example.com, the proxy captured repeated commands beginning with TS, followed by binary-looking payloads. The phone responded with lines including TS[OK], TE[OK], and NDIS 1 plus a byte.

Claude decoded the TS structure as a flash protocol distinct from the earlier text registers. The write form was:

TS A <prefix> <addr> <data> <checksum> 0x20

The fields were interpreted as the TS command, a write operation (0xA), a seven-byte device authentication prefix, a flash memory address, the byte to write, a checksum or validation byte, and a trailing 0x20. Reads used a similar command beginning TS 9, with a response of NDIS 1 <byte>.

That capture explained why the text protocol had been insufficient. There were two layers. The human-readable two-letter registers could alter runtime state, but the official software used the TS binary path to write flash byte by byte. The slide’s summary was blunt: persistence was the real puzzle, not access.

The checksum was a one-byte subtraction, not encryption

The remaining unknown in the flash protocol was the checksum. Starkov called the slide “Encryption,” but the discovery was that it was a simple byte-level validation formula.

Claude had known input-output pairs from the captured Windows software traffic. It knew the software was writing the string sip, and it could see the corresponding address, data byte, and checksum values in the payload. The visible bytes looked unintuitive at first: addresses such as 0x3c, 0x3d, and 0x3e; data bytes such as 0x2a, 0x22, and 0x21; checksum bytes such as 0x95. The ASCII bytes for s, i, and p did not appear directly in the way a naïve plaintext interpretation would suggest.

Claude derived a write-checksum formula:

The read checksum was simpler:

Starkov emphasized that Claude did not merely guess the checksum format. It derived a candidate from observed pairs and then confirmed the pattern by running additional values through the phone in a closed loop.

At that point, the earlier mystery control commands also made sense. After the flash writes, the official software sent a save sequence:

CommandObserved role
CEcommit config
GB=40save flash block
ME=1memory enable
MR=1reboot
The control commands did not persist text-protocol writes by themselves; after binary flash writes, they formed the save sequence.

The same commands had appeared during the failed brute-force phase, where several returned OK but produced no lasting configuration. The missing dependency was the binary flash-write layer. The slide explained that these commands did not work with the text protocol alone, but after binary flash writes they committed the configuration.

Claude then read all 256 bytes of the flash region with TS 9 and mapped addresses. The map included a six-byte security code at 0x0e, a 32-byte SIP server/registrar region beginning at 0x3c, four 20-byte auto-dial phone-number slots beginning at 0x7c, 0x90, 0xa4, and 0xb8, and a 48-byte read-only firmware ID at 0xcf. The source also noted that SIP credentials were stored in a separate region managed by the CE/GB/ME/MR save sequence.

Once the protocol was known, Windows disappeared from the workflow

The final programming flow no longer required the proprietary Windows application. Starkov could factory-reset the phone and configure it directly from a Mac with Python and netcat-style commands.

The direct programming path wrote the SIP server one byte at a time into flash. For each byte of vikingphone.sip.us1.twilio.com, the code computed the flash address as 0x3c + i, calculated the checksum as (0x195 - addr - byte) & 0xFF, and sent the corresponding TS A write command. It then set SIP credentials through the text protocol with commands such as UCvikingphone, UPs3cret, and URvikingphone.sip.us1.twilio.com, followed by the commit-and-reboot sequence: CE, GB=40, ME=1, MR=1.

The end state was the original architecture: the Viking phone auto-dialed on handset lift, emitted a SIP INVITE to Twilio, Twilio routed to an HTTP webhook, and the ElevenLabs Conversational AI agent answered.

StagePath
Handset liftViking K-1900D-IP auto-dials
Voice call transportSIP INVITE to Twilio SIP Domain
RoutingTwilio routes through an HTTP webhook
ResponseElevenLabs Conversational AI Agent answers and talks
The final working architecture connected the physical handset to the ElevenLabs agent through Twilio.

Starkov packaged the discovered protocol as a Claude Code skill and open-sourced it. The skill file, as shown in the source, can be installed by copying viking-phone.md into .claude/commands/viking-phone.md, after which the user can ask Claude to “configure SIP with these credentials.” The packaged reference includes the register map, flash addresses, checksum formulas, save sequences, and bootstrap procedures.

His broader recommendation came from this case: if someone wants to work with another piece of hardware, they should consider connecting to it directly rather than treating the manufacturer’s interface as the only possible route. A year earlier, in his telling, a user trying to configure this phone needed the proprietary software interface supplied by the manufacturer. In this case, once Claude could connect to the device, probe its ports, infer a command surface, and observe the official software through a proxy, the proprietary UI was no longer the controlling interface.

The conditions that made this work were all visible in the Viking phone case. The device exposed a reachable network daemon. Its command space was small enough for brute force at the text layer. Its flash checksum was one byte. Its official software could be run in a VM and coerced through a proxy. Starkov explicitly acknowledged the luck in the checksum: if the “encryption” had been harder, the break would probably have been harder too.

Starkov describes Claude as the orchestrator and himself as the hands

The strongest part of Starkov’s account was not only the protocol sequence, but the division of labor he described between himself and Claude Code.

When asked whether Claude controlled the Windows VM or whether a human had to operate it manually, Boris Starkov said his participation was “basically following Claude's commands.” Claude would ask him to perform physical actions: take the handset, listen for beeps, reboot the phone, operate the VM. He joked that when he reported hearing three beeps, Claude told him it had to be either two or four, and he corrected himself.

I was actually like the agent for Claude. Like Claude was orchestrating the whole thing. And answering your... basically I was the hands for Claude.

Boris Starkov · Source

He said he could potentially have closed more of the loop with a computer-use tool controlling the VM, but he did not. Instead, he manually executed the steps Claude requested.

When asked where Claude got stuck and he had to help, Starkov distinguished physical unblocking from intellectual unblocking. He said he could reboot the phone or control the virtual machine, but he could not have unblocked the algorithmic reverse engineering. “I had no idea what was going on,” he said. The “intellectual part,” including cracking the checksum algorithm, was something he said he would not have been able to do himself.

The cost was modest in his account. Asked how many tokens he burned, he said he used ElevenLabs corporate tokens and estimated the cost in the order of ten to a hundred dollars.

The physical connection was also ordinary. The phone used PoE: an Ethernet cable plus a power adapter combined to power and network the device. Starkov put a router between the phone and his laptop, connected the phone to the router, connected his laptop to the same router over Wi-Fi, and let Claude scan connected devices until it found the phone.

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free