Context Graphs Give AI Agents Rules, Precedent, and Decision Traces

Andreas KolleggerAI EngineerThursday, May 28, 202611 min read

In a Neo4j talk, Zaid Zaim and Andreas Kollegger argue that AI agents need more than language models, tools, and retrieval if they are to make consequential decisions. Zaim frames context graphs as a way to store the policies, prior decisions, causal links, and reasoning traces behind an action; Kollegger extends that into a five-stage decision workflow in which agents frame the case, check rules and precedent, assess risk, act only within authority, and write the outcome back to the graph as future precedent.

Decision-aware agents need a record of rules, precedent, authority, and reasoning

Zaid Zaim frames context graphs as an answer to a specific gap in current agent systems: agents may be strong at language, reasoning, and creativity, and they may have access to knowledge and tools, but they still often lack the explicit “why” behind an action. Knowledge graphs supply knowledge, context, and enrichment; LLMs supply language, reasoning, and creativity. The goal is to combine them so agents can retrieve relevant information and act within the rules, policies, and prior decisions that make a response appropriate.

The core shift is from an agent that can answer questions to an agent that can explain and constrain its decisions. A context graph is presented as a knowledge graph designed to capture “decision traces” — the full context, reasoning, and causal relationships behind significant decisions. That is different from a traditional audit log, which may record that a transaction was rejected at 14:32 but not preserve relationships between events, causal chains, or reasoning. The context graph, by contrast, is meant to preserve the surrounding “why”: entities, relationships, events, tribal knowledge, and causal traces in a connected, traversable structure.

That distinction matters because the missing element in many agent deployments is not simply an extra document, a better embedding, or a more capable model. Zaim says context graphs are meant to add rules and policies to the agent’s knowledge so the system can become more capable of driving decisions. In the talk’s “Missing ‘Why’” framing, attributed on screen to MIT, three failures are singled out: agents with no memory of prior interactions, systems with no audit trail when something goes wrong, and multiple agents that cannot share what they have learned. The same material states that “95% of AI pilots fail to deliver returns.” The example is an agent recommending approval of a $100,000 credit line increase. The question is not only whether the recommendation is correct; it is whether the system can explain why that recommendation was made, under which rules, and with what prior knowledge.

$100k

example credit line increase used to illustrate why an agent’s recommendation needs memory, auditability, and shared learning

The memory model has three layers. Short-term memory captures the immediate conversation history and session context. Long-term memory captures a more generalized graph of entities and relationships — organizations, people, locations, preferences, and other durable context. Reasoning memory captures traces of decisions: the steps taken, tools called, comparable traces, and provenance. In Neo4j’s model, all three feed a context graph, with vector search and graph traversal available together.

Memory layer	What it captures	Role in a decision-aware agent
Short-term	Conversation history and session context	Keeps the immediate request and recent interaction available
Long-term	Entities and relationships such as people, organizations, locations, and preferences	Provides durable context about the environment and actors involved
Reasoning	Decision traces, reasoning steps, tool calls, similar traces, and provenance	Preserves why earlier decisions were made and makes them reusable as precedent

The three-part memory model for agents using a Neo4j context graph

This structure matters because a decision-aware agent has to move between the immediate request, the broader institutional context, and the reasoning traces left by earlier decisions. The immediate conversation tells the agent what the user is asking for. The long-term graph tells it who the user is, what organization they belong to, what relationships are relevant, and what factual context is already known. The reasoning graph is where the agent can inspect what was done previously, which tools were used, and how prior outcomes were justified.

Zaim places this inside an Agentic GraphRAG pattern. A user sends a query to an agent. The agent plans, retrieves, and responds, using tools such as vector-plus-graph retrieval, Text2Cypher, domain-specific Cypher templates, and connectors such as HRIS retrieval. If the answer is not already available from the agent’s immediate knowledge source, the agent can query the graph database, traverse relationships, and return content that Zaim characterizes as more reliable and more qualitative.

The point is not retrieval alone. Retrieval supplies knowledge. Context graphs are meant to supply the constraints and explanation that shape a decision. In Zaim’s terms, this means moving beyond “what an agent can do” toward “why an agent needs to do something.” For an eligibility decision in financial services, for example, the graph is not only a store of customer facts. It is also where policies, rules, prior decisions, and reasoning traces can be made available to future agents.

The hard case is not answering; it is choosing under missing instructions

Andreas Kollegger takes the decision problem to the point where an autonomous agent runs beyond its explicit setup. Decision-making, in his definition, appears when an agent reaches a point in a workflow where it has to do something that was not fully anticipated by its instructions. The agent has a goal, tools, and perhaps memory, but it encounters a circumstance where the prompt or task description does not specify the right action.

His household example is deliberately simple: give an agent a credit card and access to an Amazon account, then tell it to keep the fridge stocked with Red Bull. If the agent notices supplies are low, ordering more Red Bull seems aligned with the objective. But the user may also need rent money, and the agent may not have been told how to trade off beverage inventory against upcoming financial obligations. Prompt engineering can patch the immediate failure by adding more instructions. Kollegger’s claim is that the deeper problem is more general: agents need an explicit framework for making good decisions when the relevant constraint was not already encoded in the task.

That problem grows with multi-agent systems. One agent making local decisions is already exposed to missing instructions. Multiple specialized agents operating in parallel have to coordinate, be aware of one another, and understand their own boundaries. The scale of autonomy increases the demand for a decision process that can be represented, invoked, audited, and improved.

Kollegger presents the framework as something that can be implemented in systems such as LangGraph, ADK, or custom skills. He emphasizes that the workflow is not exotic. It resembles how humans make decisions when they are careful: frame the problem, consider prior context and rules, assess risk and value, decide whether one has authority to act, and then preserve the outcome. The engineering task is to make that implicit human practice explicit enough for agents to use.

A lot of our practice as AI engineers is being explicit about the implicit knowledge that we carry with us.

Andreas Kollegger

He also cautions that the workflow is hard to generalize in implementation. The framework is general, but the particulars of every step become domain-specific.

The first step is to frame the local decision, not to search for a general answer

Andreas Kollegger says the decision workflow starts with framing: objective, causality, and environment. The objective is what the agent is trying to resolve. The causality is how the system arrived at the decision point — the reasoning chain and actions that produced the current uncertainty. The environment is the situation that will be affected by the decision.

He notes that he would likely lead with causality, because the path to the decision point explains why a decision is needed at all. An agent has been pursuing a task, taking steps, and then reaches a moment where uncertainty appears. Rather than simply choosing the next action, it should enter a decision-making subprocess.

The environment changes the meaning of the same formal problem. Ordering Red Bull is a low-stakes purchasing decision. Medical guidance is not. A financial loan increase is another environment with its own consequences, procedures, and authorities. Humans often understand such differences through experience. Kollegger argues that agents have to be told these things explicitly, because the surrounding environment determines what kinds of decisions are acceptable and what kinds of risk matter.

Once the local frame is established, the agent needs global guidance. Kollegger divides that into precedent and alignment. Precedent asks what was done before in a similar situation. Alignment asks which hard and soft rules apply now. Hard rules may be formally described in business-process language. Soft rules may live in Slack channels, Google Docs, or other informal institutional guidance.

The tension between precedent and current rules is important. Prior decisions are useful because consistency is valuable. But precedent is not always binding. Rules may have changed; the earlier decision may need to be overruled; or a new context may make the old answer inappropriate. Kollegger’s framework keeps both inputs in view rather than reducing decision-making to either “copy the past” or “follow the latest retrieved policy.”

Risk analysis begins by asking which reference class the case belongs to

Andreas Kollegger spends the most time on the assessment stage: risk, value, and proposal. Risk analysis asks what matters, whether the decision is reversible, and what the cost of being wrong would be. Value analysis asks what the system is maximizing or minimizing. The proposal stage generates choices with pros and cons, rather than immediately selecting one.

The central risk concept is “reference class validation.” Kollegger uses it to mean that before acting, the system must determine which group or situation the case belongs to, because the relevant risk may differ sharply across groups. The medical example makes the point: prescribing drug X for symptom Y may be correct 99% of the time. But for the remaining 1%, the same prescription may be fatal. In that case, aggregate statistical behavior does not answer the decision. The critical question is whether the patient is in the 99% or the 1%.

99%

share of cases in Kollegger’s medical example where drug X may be the right prescription for symptom Y

The point of the example is not that every decision is medical or fatal. Kollegger says there are “flavors” of the same structure in many decision-making settings. The agent has to classify what matters, what does not matter, and what risk is actually involved for the players in the decision. That classification cannot be left to general model behavior when the particulars determine the outcome.

Reversibility changes the analysis. A decision that can be undone carries a different risk profile from one that cannot. If an agent makes a bad purchase and the order can be cancelled, the system may tolerate more autonomy. If the decision concerns a life, money, legal authority, or another high-impact outcome, the threshold changes. Kollegger’s contrast is direct: failing to stock Red Bull is not the end of the world; a fatal prescription is.

Value also has to be explicit. An agent instructed to keep a refrigerator stocked might infer that the value to maximize is availability of Red Bull. But the user may be trying to minimize spending, save for a vacation, preserve rent money, or meet some other objective. Without a clear account of what the system is maximizing or minimizing, the agent may optimize the wrong thing while appearing to satisfy the task.

Kollegger does not make the assessment stage responsible for the final decision. In his preferred multi-agent framing, a focused assessment agent produces alternatives and their pros and cons. Another actor or agent decides what to do with them. This separation prevents proposal generation from silently becoming action.

Authority determines whether the agent acts, escalates, or defers

Andreas Kollegger moves from assessment to action: authority, enactment, escalation, and oversight. The agent must determine whether it has the authority to act. If it does, it can rank the available options, select one, and enact it. If it lacks authority or certainty, it should escalate to another agent or human actor with the necessary privileges.

Kollegger treats this as an essential boundary for autonomous systems. The decision is not only “what is the best option?” It is also “who is allowed to make this decision?” A low-risk, reversible, authorized action may be handled by the agent. A higher-risk decision, or one outside the agent’s mandate, should trigger oversight. That oversight may be a human-in-the-loop process or a more privileged agent.

The outcome stage is where the context graph becomes self-improving. Once the decision has been made — or once the system determines that it cannot decide — the process must be recorded. Kollegger’s outcome states include remember, resolve, and defer. “Resolve” marks the decision as done. “Defer” marks it as pending for later. “Remember” records the full reasoning process for future reference.

That record includes what was considered, what was not considered, the decision itself, and the actions taken. Kollegger ties this to tracing and accountability: the graph stores the reasoning chain alongside the outcome, making it available as precedent for future agents. The system is not merely logging that an action occurred. It is preserving the contextual explanation that future decisions can traverse.

Stage	Core question	Output
Framing	What is the objective, how did we get here, and what environment will be affected?	A local context for the decision
Guidance	What was done before, and which hard or soft rules apply now?	Precedent and alignment constraints
Assessment	What are the risks, what is being maximized or minimized, and what choices are available?	Alternatives with pros and cons
Action	Who has authority to decide and act?	Enactment, escalation, or oversight
Outcome	What should be remembered, resolved, or deferred?	A recorded decision trace available as future precedent

Kollegger’s agentic decision-making framework

The complete workflow diagram ties those stages together as a loop rather than a one-time checklist. Framing feeds guidance; guidance feeds assessment; assessment produces options; action either enacts, escalates, or routes to oversight; oversight can evaluate and modify the response; the outcome is then remembered, resolved, or deferred. The important design choice is that the trace of this flow is written back into the graph, so a later agent can treat the decision as precedent rather than rediscovering the same context from scratch.

The framework is portable, but the schema is not generic

Andreas Kollegger closes with the practical constraint that the workflow is general, but each step becomes domain-specific in practice. The same top-level stages can be applied to purchasing, finance, medical guidance, or internal enterprise workflows, but each domain has to define its own risk classes, authorities, policies, and precedent schema. A graph can hold the structure, but the contents must reflect the business or operational environment in which the agent is acting.

For a purchasing agent, the relevant schema might need spending limits, reversibility, budget priorities, supplier constraints, and household or organizational preferences. For a financial-services agent, it may need eligibility rules, prior credit decisions, oversight thresholds, and the policies that determine who can approve or decline an increase. For medical guidance, Kollegger’s example makes the domain specificity sharper: the system would need to know the reference classes that make a normally appropriate recommendation dangerous for a particular patient group. The framework can name “risk” and “authority” in the abstract, but the organization has to encode what those words mean in the domain.

That is also why the context graph is positioned as more than an audit mechanism. It is a shared memory substrate for agents that need to learn from prior decisions. If every meaningful decision records its objective, causal path, environment, applicable rules, risk analysis, authority check, action, and outcome, then future agents inherit more than facts. They inherit a decision history.

Zaim’s framing and Kollegger’s workflow converge on the same claim: explainable agents require a connected representation of the reasons behind action. Knowledge graphs help agents retrieve and relate facts. Context graphs add the policies, precedents, traces, and causal relationships that explain why a given action is appropriate in a given context. The practical test is whether the agent can identify when general behavior is insufficient — when it must classify the specific case, check authority, escalate, or write a new precedent back into the graph.

AI Application Architecture Data and Training RAG and Knowledge Systems Agents and Autonomy