Context Graphs Let Agents Retrieve Precedents, Not Just Policies

Zach BlumenfeldAI EngineerFriday, May 29, 202610 min read

Neo4j’s Zach Blumenfeld argues that agents built for operational decisions need context graphs rather than document retrieval alone. In his model, a standard knowledge base can tell an agent the relevant facts and policies, but a context graph adds prior decision traces, causal links, precedents and outcomes, allowing the agent to retrieve how similar cases were resolved. He presents `create-context-graph` and `neo4j-agent-memory` as open-source scaffolding for building that pattern with graph entities, short-term memory and embedded reasoning traces.

A knowledge base can answer; a context graph is meant to decide

Zach Blumenfeld framed context graphs as an extension of retrieval systems for agents that need more than correct factual answers. A knowledge base, in his formulation, gives an agent “information required to answer questions”: facts, entities, policies, and current state. A context graph adds the information required to make decisions: decision traces, precedents, causal chains, expert-validated outcomes, and the reasoning behind prior actions.

The difference matters because many agent tasks are not really question-answering tasks. Blumenfeld used a financial analyst agent as the concrete example: “Approve credit limit increase for Jessica Norris? Requesting £25K.” A standard retrieval setup can pull customer information, transactions, and policies, then produce a cautious response: moderate-to-high risk, elevated-risk jurisdiction, scrutiny under a high-value transaction policy, and a recommendation for manual review before proceeding.

That answer is useful, but it stops short of the operational decision. In Blumenfeld’s example, the context graph lets the agent retrieve not only the current facts but also past decision traces and precedents. The resulting recommendation becomes more direct: reject or escalate for senior review. The reasons are no longer just generic risk factors. They include a past fraud flag from April 2025 with a velocity-check failure, multiple compliance rejections, a high-risk tier margin account, and an employer associated with elevated country risk.

Context graphs are about precedents, causal chains, expert-validated outcomes and enabling the agent to act with subject matter expertise.

Zach Blumenfeld

Blumenfeld summarized the distinction this way: a knowledge base tells the agent what is true; a context graph tells the agent what to do and why. He did not present the graph database by itself as the decision-maker. He argued that decision-making agents need a memory layer that preserves how similar situations were resolved, not merely a document store that preserves policies and records.

The graph model separates what exists, what happened, and why it happened

The conceptual model Blumenfeld presented has three broad categories. First are entities: the things that exist, such as people, accounts, transactions, and organizations. Second are events: decisions, transactions, approvals, rejections, support tickets, and other things that happened. Third is context: the reasons attached to those events, including policies applied, risk factors considered, employee reasoning, and AI-generated reasoning when relevant.

In the financial services example, the graph connects a person to accounts, transactions, organizations, policies, and decisions. The decision is not just a row in a log. It is connected to the factors that caused or supported it. That structure is what lets the agent traverse from Jessica Norris to her account history, from the account history to prior risk events, from those events to past decisions, and from past decisions to similar precedents.

Blumenfeld showed a demo architecture built around that idea. The example financial services agent took data from systems such as a support ticket system, CRM, and internal business data. The backing services included a FastAPI application, Claude Agent SDK, a calculation agent, an agent API, a Neo4j context graph, Neo4j Graph Data Science, and vector search. The frontend was a Next.js UI with graph visualization.

The visible demo interface included an AI assistant chat panel, a graph visualization, and a decision trace panel. When asked about Jessica Norris, the assistant called tools such as search_customer and get_customer_info. It then used a “find precedents” step to locate related decision traces. Blumenfeld showed a Neo4j query over decision nodes connected by PRECEDENT_FOR and CAUSED relationships, emphasizing that the agent was not merely retrieving documents but traversing causal and precedent relationships among prior decisions.

The slide summarizing the agent’s process listed four steps. The agent queries the context graph for Jessica’s full profile: accounts, transactions, employer, and risk tier. It traces decision history: past fraud flags, compliance rejections, and velocity-check failures. It performs hybrid search across semantic similarity and structural similarity. Finally, it returns a recommendation grounded in the retrieved context; the example slide described “31 nodes” and “30 relationships” as Jessica’s complete context in a single graph traversal.

31 nodes / 30 relationships

retrieved for Jessica Norris’s complete decision context in the demo

Precedent search combines text meaning with graph structure

The technical core of Blumenfeld’s demo was hybrid search over both text embeddings and graph embeddings. Text embeddings capture semantic similarity: whether two descriptions, rationales, or risk factors mean similar things. His example was that “fraud rejection” can match “suspicious activity alert” even if the words differ.

Graph embeddings capture a different kind of similarity: how a node is connected. Blumenfeld described embedding the connected decision-trace nodes into vectors so that structurally similar situations can be retrieved by vector similarity. The goal is to find prior decisions involving customers with similar account structures, employer types, transaction patterns, and other relational features.

That distinction is central to the context-graph argument. If prior decisions live only in documents, an agent can search for language that sounds similar. If decision traces are represented as connected nodes and relationships, the system can also search for similarity in the shape of the situation, not only in the text attached to it.

Blumenfeld said Neo4j Graph Data Science provides the graph-embedding component, and named FastRP as the graph embedding technique shown in the slides. The hybrid-search slide described the combination this way: text embeddings find decisions with similar descriptions, rationale, and risk factors, while graph embeddings find decisions involving similar account structures, employer types, and transaction patterns. The combined retrieval target is precedent decisions from situations that are both semantically and structurally similar.

In practical terms, a customer’s case might not match a prior case because both mention the same risk phrase. It might match because both involve a high-risk account tier, a similar employer relationship, a prior compliance event, and transaction patterns that led to a related decision. The agent can then reason from a precedent that is similar in graph topology as well as language.

The scaffold makes the pattern inspectable in code

create-context-graph is meant to make context graphs less like an abstract architecture pattern and more like application boilerplate. Blumenfeld described it as a new open-source project, less than a month old, that scaffolds a complete domain-specific context graph application from the command line. The slide described it as “like create-next-app, but for AI agents backed by context graph memory.”

The fastest path shown was a one-line terminal command:

uvx create-context-graph

In the demo command, Blumenfeld generated a healthcare app:

uvx create-context-graph my-health-app-2 --domain healthcare --framework pydanticai --demo-data

The terminal output showed demo data being generated: 60 entities, 119 relationships, 55 documents, and 18 decision traces. Blumenfeld said the command creates the application folder, fixtures, JSON data, backend, frontend, and demo data needed to start exploring the system.

The generated healthcare app included a chat interface, a knowledge graph visualization, and node types such as provider, patient, treatment plan, symptom, medication, test, diagnosis, and procedure. Blumenfeld asked it about prescription medications and described the app retrieving relevant information through Cypher, Neo4j’s graph query language, while also exposing schema visualization and decision traces.

For decision-trace work, the scaffold matters because it gives developers a working place to test the pieces that otherwise tend to stay separate: a graph schema for a domain, an agent interface that can query the graph, and example traces that can be inspected as graph relationships rather than as flat records. Blumenfeld positioned the generated app as a starting point for coding and exploration, not as a finished enterprise system.

The tool’s feature slide listed eight agent frameworks, 22 built-in domains, and 12 SaaS data connectors. Blumenfeld said the Pydantic AI application shown in the demo could also be generated with OpenAI, LangGraph, Crew, Google ADK, and other supported options. He also said users can import from tools such as GitHub, Notion, Jira, and Slack rather than relying only on demo data.

Capability	What Blumenfeld described
Agent frameworks	The slide listed eight framework options; Blumenfeld named Pydantic AI, OpenAI, LangGraph, Crew, and Google ADK among supported options
Built-in domains	The slide listed 22 starting domains; Blumenfeld showed healthcare and referred to financial services as another option
SaaS connectors	The slide listed 12 connectors; Blumenfeld named GitHub, Notion, Jira, and Slack as examples
Generated app pieces	Backend, frontend, demo data, context graph features, chat, MCP server support, and interactive graph visualizations
Custom domains	If users can describe the ontology and what is in the data, the package can help create a graph schema to guide extraction and mapping
Agent interface	The slide listed graph-native AI agents, an MCP server for Claude Desktop, multi-turn conversations, streaming chat, and interactive graph visualizations

Create-Context-Graph capabilities as described in the talk and slides

Blumenfeld described the tool as “just getting started,” open source, and open to contributions. Its value, as he presented it, is that it gives developers a working graph-backed agent application quickly enough to inspect the pattern in code rather than only read about context graphs conceptually.

Agent memory needs short-term state, long-term entities, and reasoning traces

The scaffold depends in part on neo4j-agent-memory, which Blumenfeld described as a complete graph-native memory API for AI agents. In his model, context graphs need three kinds of memory working together: short-term, long-term, and reasoning memory.

Short-term memory is conversation history and session context. This is the immediate state an agent needs to maintain within an interaction. Long-term memory is the knowledge graph of extracted entities and relationships, especially entities that repeat over time and need to be resolved rather than duplicated. Reasoning memory is where context graphs enter more specifically: the stored traces of decisions, explanations, and precedents that let the agent learn from prior outcomes.

Blumenfeld spent particular time on entity extraction because it is a common practical problem: if the input is text, how does it become a knowledge graph? The package implements a staged pipeline over raw text. He described it as moving from spaCy to GLiNER to a more advanced LLM fallback, followed by separate merging, deduplication, and enrichment. The point of that pipeline is not simply named entity recognition. It is to turn short-term information into long-term entities that can be resolved over time.

The schema he showed connected conversations to extracted entities and then to reasoning traces. That matters because decision traces are only useful if they remain tied to the entities and events they are about. A trace that says “reject because of prior velocity failure” is more useful when the graph connects that reasoning to the customer, account, transaction pattern, policy, and prior decision.

Blumenfeld also noted that other ingestion approaches are possible. Structured data can be mapped directly, and users can bring their own language-model extraction or NER process. But the included extraction pipeline is meant to give developers a working path from raw text into a graph without designing the entire memory layer from scratch.

The unresolved product questions are time, ontology, and trace quality

The open edges in Blumenfeld’s account fell into three product questions: how time should affect relevance, how much ontology design can be automated, and when a new decision trace should be admitted into memory.

On temporal weighting, a questioner asked whether causal chains include a concept of temporality: whether the validity of an event changes based on time, and whether more recent events receive more weight. Blumenfeld said timestamps can be added, and events can be linked in temporal or causal order through relationships such as “caused” or “next.” But he did not claim the current system automatically weights recent events more heavily. He said he did not know if it does that yet and treated recency weighting as something that may emerge as the technology matures.

On ontology design, another question asked whether the ontology is fully general or prepared in advance. Blumenfeld said the ontology is prepared beforehand. For built-in domains, he said the system uses predefined entity and relationship types to guide extraction and mapping. For custom domains, he pointed back to the create-context-graph package: if a user can describe the ontology and the contents of the data, the tool can help create a graph schema.

The answer also depended on the source data. For structured data such as CSV files or tables, Blumenfeld said the problem becomes mapping Cypher statements over the data. For PDFs and other text-heavy sources, he suggested the custom-domain and extraction paths in the project, while noting that the project is still “rougher on the edges.”

On trace admission and quality control, a questioner pushed on whether the system can decide when a new trace is genuinely novel and should be added to the graph, versus already represented by existing traces. The questioner also asked whether this points toward a self-learning cognitive system or whether a user should remain involved.

Blumenfeld’s answer was cautious. He said Neo4j is still figuring that out. In the first demo, a user can ask the system to store decisions from a previous conversation, but he did not think it would do so unless prompted. In create-context-graph, he said the team is still working on how new decision traces should be written. He also acknowledged the possible need for some kind of sentiment or quality score on decisions.

Blumenfeld described the current pattern as a way to represent decisions, causes, entities, policies, and outcomes in a graph; embed both text and structure; and retrieve precedents for agents that need to act with prior institutional context. He did not present it as a fully autonomous institutional memory that already decides what is worth remembering, how much it should trust each trace, and how time should change relevance.

AI Application Architecture Data and Training RAG and Knowledge Systems Agents and Autonomy Open Models