Hermes Uses a Minimal Agent Loop to Preserve State Across Channels

Alejandro AOHugging FaceWednesday, June 17, 202611 min read

Alejandro AO’s walkthrough of Hermes presents the agent as a deliberately small always-on system rather than a complex orchestration stack. He argues that Hermes’ usefulness comes from a simple loop that builds context from Markdown files, message history, tools, skills and memory, then preserves state through compression, SQLite transcripts, optional external memory providers, gateway integrations and scheduled cron jobs. The architecture’s central concern is continuity: keeping enough context across channels and time for the agent to behave like a persistent assistant.

Hermes is built around a small loop, not a large orchestration system

Alejandro AO presents Hermes as a deliberately simple always-on agent architecture. The core is an AI agent that can be reached through several interfaces: a command-line interface, a gateway for messaging services, and potentially an API. Around that core sit tools, skills, and memory.

The high-level architecture is a small hub-and-spoke system. CLI, Gateway, and API point into AI Agent. Beneath the agent are tools, skills, and memory. Memory is split into external providers, such as Mem0 and SuperMemory, and internal memory, including session transcripts and files such as SOUL.md and User.md.

The agent loop itself is the familiar minimalist pattern used by other lightweight agents: the user sends a message; Hermes builds context; the assembled context and message history are sent to the language model; the model may call tools; tool results are returned to the model; the loop continues until the model produces a final response; and then Hermes performs a memory update.

That last step is important to the architecture. After responding, Hermes analyzes the interaction to decide whether anything should be remembered. If so, it writes that information into memory so that future interactions can use it. AO frames this as what makes Hermes “continuously learning and improving” as it is used: the agent accumulates and retrieves explicit memory, transcripts, summaries, and user-specific files.

AO says every conversation is stored in the agent’s internal memory “in the form of a session transcript.” The architecture separates memory into two broad categories. Internal memory includes session transcripts and local Markdown files such as SOUL.md and User.md. External memory, when configured, can come from providers such as Mem0, SuperMemory, Honcho, and others. Hermes does not require those external systems by default, but it can use them to retrieve relevant material from past interactions.

The result is not a mysterious agent stack. It is a loop that repeatedly builds a prompt from files, prior messages, tool descriptions, skills, and memory, then hands that assembled state to an LLM.

The context is mostly Markdown files, message history, and optional retrieved memory

Context construction is one of the most important parts of the design. Alejandro AO describes Hermes’ context as “very straightforward and very minimalist”: a set of Markdown files, system instructions, message history, skills, tools, and, when available, relevant material from external memory.

The first major file is SOUL.md. AO describes it as the agent’s personality file: the place to define tone, goals, inspiration, behavioral approach, and what the agent is meant to be like. He compares the role of this file to large, carefully written system prompts used by assistants such as Claude, but emphasizes that Hermes users are expected to personalize it for themselves.

On a fresh Hermes install, SOUL.md is usually empty. If the user does not manually fill it in or ask Hermes to write it, Hermes falls back to a default system prompt that identifies the agent as Hermes, a virtual assistant that is always on. AO’s recommendation is to create a custom SOUL.md, because it is the primary way to tailor the agent’s behavior.

The second file is memory/User.md. Unlike SOUL.md, Hermes updates this file automatically when it learns facts about the user. If a user says they are a software engineer working on a particular project, or a market analyst working on a specific area, Hermes can classify that as user information and store it in User.md.

The third file is memory/Memory.md. AO distinguishes it from User.md: it is not primarily biographical information about the user, but arbitrary memory. Hermes may use it to store facts about workflows, tool usage, useful things learned during conversations, or other information relevant to future interactions. What gets saved depends partly on whether the agent finds it useful or interesting and partly on the goals set in SOUL.md.

Context source	Role in Hermes
`SOUL.md`	Defines the agent’s personality, tone, goals, and behavioral approach.
`memory/User.md`	Stores information Hermes learns about the user.
`memory/Memory.md`	Stores arbitrary useful facts, workflows, tool knowledge, and other remembered material.
External memory	Optionally contributes relevant summaries or memories from past sessions.
Skills and tools	Adds descriptions of what the agent can do.
Messages or summary	Adds the recent conversation, or a compressed summary if the thread is too long.

The main pieces Hermes assembles into context before calling the model.

Past sessions can also appear in the context, but only if external memory is configured. In that case, Hermes can include summaries or retrieved information from older conversations that may be relevant to the current one. Without external memory setup, AO says this part does not appear by default.

Finally, Hermes adds skill descriptions, tool descriptions, and recent messages. If the conversation remains within the configured context threshold, the full relevant message history can be included. If it grows too large, Hermes replaces earlier messages with a summary.

Compression starts before the context window is exhausted

Context compression is Hermes’ practical response to finite context windows. Alejandro AO says large language models may accept hundreds of thousands of tokens, or in some cases around a million, but long-running agents still need a strategy for old messages. Hermes asks during setup when compression should trigger. The default is 50% of the context window.

When the configured threshold is crossed, Hermes summarizes previous messages and appends that summary to the context in place of the raw earlier messages. The trigger is customizable. AO notes that users working with smaller models, or models with less generous context windows, may prefer 70% or 80%, though the default remains 50%.

50%

default context-window threshold at which Hermes triggers message-history compression

Hermes checks compression at two moments. First, it checks before each turn, before calling the language model. Second, it checks on error: if the LLM returns a context-window error, Hermes can summarize and try to fit the interaction back into the available window.

The pre-call check uses an approximation. Before sending the request, Hermes has constructed the message list but does not necessarily have exact token usage for the target model. AO says Hermes uses a simple character-based estimate: total characters divided by four. If that estimated context exceeds the configured threshold, compression is triggered. He notes that a tokenizer could be used, but character division is cheaper and good enough for this purpose.

After the first model response, Hermes can use more accurate usage information returned by the provider. Depending on the LLM provider, the response may include input tokens, output tokens, or a usage parameter. That data reflects the model’s own tokenization, making it more accurate than the initial approximation.

AO also shows the compression prompt in context_compressor.py. The visible prompt instructs a summarization agent to create a “context checkpoint,” treat the conversation turns as source material, produce only the structured summary, avoid greetings or preambles, and write in the same language the user was using rather than translating to English.

The resulting summary is not just a short recap. AO says the prompt asks for multiple sections, including the overall goal, constraints, completed actions, active state, historical progress, blockers, key decisions, resolved questions, relevant files, critical context, previous summaries, next turns to incorporate, and turns to summarize. He contrasts this with the smaller, more minimalist summarization prompt used by the Pi agent architecture he had discussed previously. Hermes’ compressor is richer and gives the agent more explicit state about what is happening.

The gateway is what makes Hermes usable outside the terminal

The gateway is the part that lets users talk to Hermes through messaging platforms such as Telegram, email, Slack, Discord, SMS, WhatsApp, and similar services. Alejandro AO says it is not necessarily the most difficult or complex component, but he considers it the part that likely helped make Hermes more popular.

The gateway’s first responsibility is to listen for incoming messages from configured services. Its second responsibility is to normalize those messages into the format expected by the AI agent. Its third responsibility is to build the conversation context and message history before passing the user’s request into the agent loop.

AO describes the gateway as an asyncio loop that runs continuously and polls or waits on different integrations. Each provider has its own mechanics. Some use webhooks. Some use a loop that runs every second and calls the provider API to check for new messages; AO gives Telegram polling as an example. Others use websockets. The important point is that the gateway is not one generic connector that works with everything by default. Each third-party integration has to be configured independently.

For Telegram, setup involves running a Hermes gateway setup command, creating a bot ID, and configuring which user IDs are allowed to communicate with that Hermes agent. The same general idea applies across providers: each gateway must know how to authenticate, receive messages, identify users or sessions, and pass authorized messages onward.

The gateway also has to reconstruct the conversation. A direct CLI session naturally has local conversational state. A Telegram message, by contrast, arrives as a single incoming message, not as a full thread. Hermes therefore uses identifiers to recover the relevant history. AO describes the session identifier as beginning with the gateway name, such as Telegram, followed by the session ID returned by that service and other IDs needed to build the full identifier.

Hermes stores the associated messages in a local SQLite database. When a new Telegram message arrives, the gateway uses the Telegram prefix and session ID to query SQLite for the prior messages in that conversation, appends them to the context, and sends that assembled context to the AI agent. In AO’s framing, this is why the gateway is more than a transport layer: if it only received and forwarded messages, Hermes would lose message history.

It also includes session management. If the user sends a new message while the agent is already working on the previous one, the gateway decides how that message should affect the running task. AO says the session manager can interrupt, steer, or queue the message. In Telegram, /interrupt interrupts the agent, /steer steers it, and an ordinary additional message is queued.

Hermes keeps memory in files, SQLite, and optional external providers

Hermes memory has three practical layers: Markdown files, SQLite transcripts, and external memory. Alejandro AO uses the dedicated memory section to separate those layers from the way they appear during context construction.

The Markdown layer is the most visible. SOUL.md, memory/User.md, and memory/Memory.md are appended into the context window after the system prompt. They define personality, user facts, and arbitrary remembered knowledge.

The SQLite layer stores full transcripts of sessions. AO says Hermes has many tables, rows, and data models, but the essential function is storing the complete text of interactions. Every session is stored in SQLite, including sessions that come through gateways. Gateway sessions can be pulled back later using identifiers such as the provider name and session ID.

AO also notes a “bare text” table containing only the text of conversations, which makes similarity search easier. This local SQLite memory is useful both for continuity and for retrieval.

The third layer is external memory, which is not enabled by default. Hermes supports providers such as Mem0, SuperMemory, Honcho, and others. AO does not present these providers as interchangeable. He says they work differently: he believes Mem0 uses similarity search or something like it; Honcho works differently; and, if he remembers correctly, SuperMemory requires sending the entire conversation history after every turn and then uses a language model to extract the relevant memory.

AO says most people do not enable external memory, but he recommends doing so. Some supported providers are free, and he argues they can significantly improve the way Hermes learns from the user.

The retrieval pattern is also worth noting. External memory is not queried before the first response in a new thread. AO says Hermes queries external memory after the first message, once the agent has enough information to infer what the conversation is about and what the user may ask next. He compares this to a human answering a question and, at the same time, remembering previous related conversations.

The practical implication is that if Hermes does not remember something on the first attempt, and external memory is configured, the user can describe what they are trying to recall in the first message and then ask a follow-up. By the second turn, Hermes may have queried the external memory system and brought relevant material into context.

Cron jobs are stored as JSON, even though the documentation says SQLite

Hermes cron jobs are scheduled tasks the agent can run at recurring times: every morning, send an email with AI news; every day, post a Slack update to a community; every Friday, send a message to a boss. Alejandro AO describes the function as familiar from server cron systems, but says Hermes implements its own loop.

Hermes cron is not tied to the server’s cron process. Instead, Hermes runs its own loop every minute. Each minute it calls a function AO identifies as tick(), checks the scheduled jobs, and executes anything due for that minute.

AO highlights a discrepancy between documentation and implementation. He says the documentation describes cron jobs as being stored in SQLite. In his analysis, he did not find cron jobs in the SQLite database, and the code did not appear to pull them from SQLite. Instead, he found them stored as plain JSON.

The path he gives is .hermes/cron/jobs.json. That file contains the listed cron jobs, their prompts, and the instructions for what each job should do. On every tick, Hermes checks jobs.json to determine whether a job should run. When a cron job is updated, the JSON file is updated.

Cron outputs are stored under the cron directory as well. AO describes an output directory containing job ID directories, with individual run files such as run.md for each execution of a job.

Cron component	What AO says it does
Hermes cron loop	Runs independently of the server cron process.
`tick()`	Runs every minute and checks whether any job is due.
`.hermes/cron/jobs.json`	Stores scheduled jobs, prompts, and job instructions as JSON.
`.hermes/cron/output/`	Stores outputs by job ID and run file.

How Hermes cron jobs are stored and executed according to AO’s code inspection.

Cron delivery is also separate from ordinary agent tool use. AO says cron does not automatically send a message by calling a send-message tool through the agent. Instead, Hermes sends a notification through the user’s configured “home” messaging platform. During gateway setup, a platform such as Telegram, Discord, or Slack can ask whether a given user ID should be the home for that gateway. Cron notifications are then delivered through that home integration.

That distinction matters because cron is not simply another prompt inside an open chat thread. It is a scheduled system process that invokes Hermes work on a timer, writes run output, and notifies the user through a configured home channel.

AI Application Architecture RAG and Knowledge Systems Agents and Autonomy