Semantic context layer for agents
The agent-facing retrieval primitive. How an agent asks for context and what it gets back. Covers the unified MCP query surface, hybrid retrieval planning, agent-facing schema descriptions, and provenance edges through retrieval.
This spec is the deeper treatment of PRD Challenge 3; the storage mechanics live in 04-context-store.md § Indexes for agent consumption; the policy enforcement is 09-access-control.md.
The shape of the problem
Section titled “The shape of the problem”Agents need to cross structured filters (account.tier = 'enterprise', created_at > '2026-04-01') and semantic retrieval (vector similarity over ticket.body, FTS over notes.text) in the same query. Current tooling forces the agent to:
- Issue a SQL query against the structured store.
- Separately issue a vector query against the vector store.
- Join the two result sets in its head.
- Re-issue a filtered query if the join is empty.
- Hope it remembers all of this.
This wastes tokens, produces unreliable results, and doesn’t compose with policy enforcement. Hakiri exposes one MCP tool — context.query — that takes both dimensions at once and plans the execution under the hood.
The MCP tool surface
Section titled “The MCP tool surface”Three tools the agent uses:
context.describe(table)
Section titled “context.describe(table)”Returns enough information for the agent to write a working query without further help.
// agent callscontext.describe({"table": "github_issues"})
// runtime returns{ "table": "github_issues", "rows": 482910, "schema_fingerprint": "sha256:abc123…", "agent_description": "GitHub issues across tracked repos. Filter by `repo`, `state`, `author`, `labels`. Use the vector index `vec-body-bge-large-en-v1.5` for semantic search over issue bodies; FTS index `fts-body` for keyword matches. Issues older than 90d live in the `archive` partition (slower).", "schema": [ {"name": "id", "type": "int64", "agent_hint": "primary key, opaque"}, {"name": "repo", "type": "string", "agent_hint": "org/name format"}, {"name": "state", "type": "string", "agent_hint": "one of: open, closed"}, {"name": "title", "type": "string", "agent_hint": "human-authored issue title"}, {"name": "body", "type": "string", "agent_hint": "issue description; can be long"}, {"name": "author", "type": "string", "agent_hint": "GitHub username"}, {"name": "labels", "type": "list<struct{name:string}>","agent_hint": "use `array_contains` for label filters"} ], "indexes": [ {"id": "pk-id", "kind": "pk", "on": ["id"]}, {"id": "vec-body-bge-large-en-v1.5", "kind": "vector", "column": "body", "model": "bge-large-en-v1.5", "dim": 1024}, {"id": "fts-body", "kind": "fts", "columns": ["title", "body"]} ], "partitions": {"scheme": "recent_90d + archive", "current_partition_rows": 12480}, "example_queries": [ "context.query(table='github_issues', filter={state:'open', repo:'torvalds/linux'}, limit=20)", "context.query(table='github_issues', filter={state:'open'}, semantic={text:'kernel panic in driver', column:'body'}, limit=20)" ]}The agent_description and per-column agent_hint fields are authored by humans, regenerable from a connector’s schema spec (an agent task an LLM does well at), and surfaced at tools/list time so the agent knows what’s queryable before it queries. This is the equivalent of a README for the table.
context.query(table, filter, semantic, project, limit)
Section titled “context.query(table, filter, semantic, project, limit)”The unified retrieval tool.
{ "table": "github_issues", "filter": { "state": "open", "repo": ["torvalds/linux", "openhackersclub/gctrl"], "created_at": {">": "2026-04-01"} }, "semantic": { "text": "kernel panic in network driver", "column": "body", "limit": 50 }, "project": ["id", "title", "url", "score", "_provenance"], "limit": 20, "with_provenance": true}Returns rows with provenance edges:
[ { "id": 481, "title": "panic on rmmod r8169 driver", "url": "https://github.com/torvalds/linux/issues/481", "score": 0.872, "_provenance": { "table": "github_issues", "run": "run_01HXYZ...", "connector": "github@0.4.2", "ingested_at":"2026-05-12T08:14:00Z", "authored_by":"agent://claude-connector-author" } }, ...]The agent can cite in its answer rather than hallucinate. Provenance survives replication (the lineage edges travel with the Parquet to every replica).
context.execute_query(id, params) — whitelisted
Section titled “context.execute_query(id, params) — whitelisted”For clean-room patterns (09-access-control.md Pattern C) where the agent’s token grants only execution of pre-approved query templates. The agent supplies a query id + parameters; the runtime binds the parameters and runs the template. No SQL from the agent.
Hybrid retrieval planning
Section titled “Hybrid retrieval planning”Given a filter and a semantic component, the runtime picks one of three plans:
| Plan | When | How |
|---|---|---|
| Filter-then-ANN | Filter is highly selective (estimated < 1% of rows) | Apply WHERE first; run ANN over the filtered row-id set using the vector index’s id_map.parquet lookup |
| ANN-then-filter | Filter is loose (estimated > 50% of rows), semantic component is the primary dimension | Run ANN to get top-K-broad, then filter the K candidates with the predicate |
| Hybrid | Both mid-selectivity | Pull top-K-broad from ANN, intersect with filter result, re-rank by combined score |
The planner uses declared per-table statistics in M2 (declared in the manifest, refreshed by compaction). A cost-based optimizer with collected statistics is the v2 candidate flagged in PRD Challenge 3.
The plan chosen is recorded in the OTel span (hakiri.query.plan = "filter-then-ann") so operators can debug retrieval quality from the trail.
Policy-aware retrieval
Section titled “Policy-aware retrieval”Every retrieved row is filtered through the capability-token policy (09-access-control.md) before reaching the agent:
- RLS: the row-level predicate applies after retrieval; rows that fail are dropped from the result before any token count is consumed.
- CLS: columns marked redacted/masked for this subject are projected accordingly (the agent sees
NULLor the masked value, not the raw). - Vector matches against forbidden rows: dropped before the top-K is returned. An attacker with vector-search access but not row-read access cannot infer row presence by similarity.
- Inference-zone filtering: rows and columns whose
inference_zone_alloweddoesn’t include the subject’s asserted zone are dropped or masked, regardless of whether the token authorizes the read. Composes with RLS / CLS in the same pass — full mechanics in15-inference-placement.md. The response envelope reportszone_filtered_rowsandzone_masked_columnsalongside the RLS / CLS counters.
The MCP response includes a hakiri.policy.applied attribute the agent (and the auditor) can inspect:
{ "hakiri.policy.applied": { "rls_filtered_rows": 12, "cls_masked_columns": ["author.email", "patient_name"], "suppressed_for_k_anonymity": 0 }}This is not information the agent uses to bypass policy; it is honest disclosure that the result has been filtered.
Embedding model in the catalog
Section titled “Embedding model in the catalog”Per PRD Pillar 3’s MCP-native commitment: the embedding model is operator’s choice, recorded in the catalog next to each vector index. Switching is a one-command rebuild:
hakiri index rebuild github_issues vec-body --model bge-large-en-v1.5The catalog records both old and new model identifiers during cutover so agents that haven’t switched keep working. Vector dimensions, distance metric, and HNSW parameters are part of the index identity — you can have vec-body-bge-large-en-v1.5 and vec-body-openai-text-embedding-3 coexisting on the same table.
This is the load-bearing piece of the provider-agnostic story for retrieval. A model swap is not a context migration.
Agent-description authoring loop
Section titled “Agent-description authoring loop”The agent_description field is high-leverage — it’s what a new agent reads to orient. Three authoring patterns:
- Human-authored, in the manifest.
agent_description = "..."on the table or column. Lives in git, PR-reviewable. - Agent-authored, then reviewed. The MCP server exposes
table.draft-description— given a connector’s source spec + sample rows, an LLM drafts the description; a human reviews and commits. - Regenerated on schema drift. When a connector’s schema evolves,
hakiri schema describe-changesproduces a diff against the currentagent_description; the agent updates the affected hints.
The thesis: this is exactly the kind of “documentation that rots if humans alone write it” problem agents handle well, with humans in the review loop.
What this spec leaves to others
Section titled “What this spec leaves to others”- The agent runtime (Claude Code, Cursor, custom MCP clients). Hakiri exposes MCP tools; how the agent uses them is its problem. We do not prescribe a prompt format, agent loop, or harness.
- Reranking models. Some users want a cross-encoder reranker on top of ANN results. v0 supports it as a per-query option (
semantic.rerank_with = "...") but does not ship a reranker — operator brings their own. - Streaming retrieval. v0 returns the full result set; streaming partial results as the agent reads is a v2 concern tied to MCP’s streaming-response surface (still evolving in the MCP spec).
Open questions
Section titled “Open questions”- Default embedding model. The laptop walkthrough needs something. Leaning local
bge-smallviafastembed-rs(no API key, ~50 MB, runs CPU-only) for the out-of-the-box experience. Operators upgrade when they care. - Hybrid plan cost model. M2 heuristic vs. v2 cost-based optimizer. Heuristic ships first; replace with CBO once usage data accumulates.
- Per-row score normalization across indexes. ANN returns cosine similarity ∈ [0,1]; FTS BM25 is unbounded. Combining for hybrid plans needs a normalization the M2 heuristic can’t get exactly right.
- Cross-table joins in
context.query. Todaycontext.queryis single-table. Cross-table is a SQL view the operator defines plus acontext.describe(view)that exposes the view’s agent-description. Whether MCP-native join is worth the complexity is open. - Streaming partial results. Tied to MCP spec evolution.