Semantic context layer for agents

The agent-facing retrieval primitive. How an agent asks for context and what it gets back. Covers the unified MCP query surface, hybrid retrieval planning, agent-facing schema descriptions, and provenance edges through retrieval.

This spec is the deeper treatment of PRD Challenge 3; the storage mechanics live in 04-context-store.md § Indexes for agent consumption; the policy enforcement is 09-access-control.md.

The shape of the problem

Agents need to cross structured filters (account.tier = 'enterprise', created_at > '2026-04-01') and semantic retrieval (vector similarity over ticket.body, FTS over notes.text) in the same query. Current tooling forces the agent to:

Issue a SQL query against the structured store.
Separately issue a vector query against the vector store.
Join the two result sets in its head.
Re-issue a filtered query if the join is empty.
Hope it remembers all of this.

This wastes tokens, produces unreliable results, and doesn’t compose with policy enforcement. Hakiri exposes one MCP tool — context.query — that takes both dimensions at once and plans the execution under the hood.

The MCP tool surface

Three tools the agent uses:

`context.describe(table)`

Returns enough information for the agent to write a working query without further help.

// agent calls
context.describe({"table": "github_issues"})

// runtime returns
{
  "table": "github_issues",
  "rows": 482910,
  "schema_fingerprint": "sha256:abc123…",
  "agent_description": "GitHub issues across tracked repos. Filter by `repo`, `state`, `author`, `labels`. Use the vector index `vec-body-bge-large-en-v1.5` for semantic search over issue bodies; FTS index `fts-body` for keyword matches. Issues older than 90d live in the `archive` partition (slower).",
  "schema": [
    {"name": "id",         "type": "int64",                    "agent_hint": "primary key, opaque"},
    {"name": "repo",       "type": "string",                   "agent_hint": "org/name format"},
    {"name": "state",      "type": "string",                   "agent_hint": "one of: open, closed"},
    {"name": "title",      "type": "string",                   "agent_hint": "human-authored issue title"},
    {"name": "body",       "type": "string",                   "agent_hint": "issue description; can be long"},
    {"name": "author",     "type": "string",                   "agent_hint": "GitHub username"},
    {"name": "labels",     "type": "list<struct{name:string}>","agent_hint": "use `array_contains` for label filters"}
  ],
  "indexes": [
    {"id": "pk-id",                       "kind": "pk",     "on": ["id"]},
    {"id": "vec-body-bge-large-en-v1.5",  "kind": "vector", "column": "body", "model": "bge-large-en-v1.5", "dim": 1024},
    {"id": "fts-body",                    "kind": "fts",    "columns": ["title", "body"]}
  ],
  "partitions": {"scheme": "recent_90d + archive", "current_partition_rows": 12480},
  "example_queries": [
    "context.query(table='github_issues', filter={state:'open', repo:'torvalds/linux'}, limit=20)",
    "context.query(table='github_issues', filter={state:'open'}, semantic={text:'kernel panic in driver', column:'body'}, limit=20)"
  ]
}

The agent_description and per-column agent_hint fields are authored by humans, regenerable from a connector’s schema spec (an agent task an LLM does well at), and surfaced at tools/list time so the agent knows what’s queryable before it queries. This is the equivalent of a README for the table.

`context.query(table, filter, semantic, project, limit)`

The unified retrieval tool.

{
  "table":    "github_issues",
  "filter":   { "state": "open", "repo": ["torvalds/linux", "openhackersclub/gctrl"], "created_at": {">": "2026-04-01"} },
  "semantic": { "text": "kernel panic in network driver", "column": "body", "limit": 50 },
  "project":  ["id", "title", "url", "score", "_provenance"],
  "limit":    20,
  "with_provenance": true
}

Returns rows with provenance edges:

[
  {
    "id": 481,
    "title": "panic on rmmod r8169 driver",
    "url": "https://github.com/torvalds/linux/issues/481",
    "score": 0.872,
    "_provenance": {
      "table":      "github_issues",
      "run":        "run_01HXYZ...",
      "connector":  "github@0.4.2",
      "ingested_at":"2026-05-12T08:14:00Z",
      "authored_by":"agent://claude-connector-author"
    }
  },
  ...
]

The agent can cite in its answer rather than hallucinate. Provenance survives replication (the lineage edges travel with the Parquet to every replica).

`context.execute_query(id, params)` — whitelisted

For clean-room patterns (09-access-control.md Pattern C) where the agent’s token grants only execution of pre-approved query templates. The agent supplies a query id + parameters; the runtime binds the parameters and runs the template. No SQL from the agent.

Hybrid retrieval planning

Given a filter and a semantic component, the runtime picks one of three plans:

Plan	When	How
Filter-then-ANN	Filter is highly selective (estimated < 1% of rows)	Apply `WHERE` first; run ANN over the filtered row-id set using the vector index’s `id_map.parquet` lookup
ANN-then-filter	Filter is loose (estimated > 50% of rows), semantic component is the primary dimension	Run ANN to get top-K-broad, then filter the K candidates with the predicate
Hybrid	Both mid-selectivity	Pull top-K-broad from ANN, intersect with filter result, re-rank by combined score

The planner uses declared per-table statistics in M2 (declared in the manifest, refreshed by compaction). A cost-based optimizer with collected statistics is the v2 candidate flagged in PRD Challenge 3.

The plan chosen is recorded in the OTel span (hakiri.query.plan = "filter-then-ann") so operators can debug retrieval quality from the trail.

Policy-aware retrieval

Every retrieved row is filtered through the capability-token policy (09-access-control.md) before reaching the agent:

RLS: the row-level predicate applies after retrieval; rows that fail are dropped from the result before any token count is consumed.
CLS: columns marked redacted/masked for this subject are projected accordingly (the agent sees NULL or the masked value, not the raw).
Vector matches against forbidden rows: dropped before the top-K is returned. An attacker with vector-search access but not row-read access cannot infer row presence by similarity.
Inference-zone filtering: rows and columns whose inference_zone_allowed doesn’t include the subject’s asserted zone are dropped or masked, regardless of whether the token authorizes the read. Composes with RLS / CLS in the same pass — full mechanics in 15-inference-placement.md. The response envelope reports zone_filtered_rows and zone_masked_columns alongside the RLS / CLS counters.

The MCP response includes a hakiri.policy.applied attribute the agent (and the auditor) can inspect:

{
  "hakiri.policy.applied": {
    "rls_filtered_rows": 12,
    "cls_masked_columns": ["author.email", "patient_name"],
    "suppressed_for_k_anonymity": 0
  }
}

This is not information the agent uses to bypass policy; it is honest disclosure that the result has been filtered.

Embedding model in the catalog

Per PRD Pillar 3’s MCP-native commitment: the embedding model is operator’s choice, recorded in the catalog next to each vector index. Switching is a one-command rebuild:

hakiri index rebuild github_issues vec-body --model bge-large-en-v1.5

The catalog records both old and new model identifiers during cutover so agents that haven’t switched keep working. Vector dimensions, distance metric, and HNSW parameters are part of the index identity — you can have vec-body-bge-large-en-v1.5 and vec-body-openai-text-embedding-3 coexisting on the same table.

This is the load-bearing piece of the provider-agnostic story for retrieval. A model swap is not a context migration.

Agent-description authoring loop

The agent_description field is high-leverage — it’s what a new agent reads to orient. Three authoring patterns:

Human-authored, in the manifest. agent_description = "..." on the table or column. Lives in git, PR-reviewable.
Agent-authored, then reviewed. The MCP server exposes table.draft-description — given a connector’s source spec + sample rows, an LLM drafts the description; a human reviews and commits.
Regenerated on schema drift. When a connector’s schema evolves, hakiri schema describe-changes produces a diff against the current agent_description; the agent updates the affected hints.

The thesis: this is exactly the kind of “documentation that rots if humans alone write it” problem agents handle well, with humans in the review loop.

What this spec leaves to others

The agent runtime (Claude Code, Cursor, custom MCP clients). Hakiri exposes MCP tools; how the agent uses them is its problem. We do not prescribe a prompt format, agent loop, or harness.
Reranking models. Some users want a cross-encoder reranker on top of ANN results. v0 supports it as a per-query option (semantic.rerank_with = "...") but does not ship a reranker — operator brings their own.
Streaming retrieval. v0 returns the full result set; streaming partial results as the agent reads is a v2 concern tied to MCP’s streaming-response surface (still evolving in the MCP spec).

Open questions

Default embedding model. The laptop walkthrough needs something. Leaning local bge-small via fastembed-rs (no API key, ~50 MB, runs CPU-only) for the out-of-the-box experience. Operators upgrade when they care.
Hybrid plan cost model. M2 heuristic vs. v2 cost-based optimizer. Heuristic ships first; replace with CBO once usage data accumulates.
Per-row score normalization across indexes. ANN returns cosine similarity ∈ [0,1]; FTS BM25 is unbounded. Combining for hybrid plans needs a normalization the M2 heuristic can’t get exactly right.
Cross-table joins in context.query. Today context.query is single-table. Cross-table is a SQL view the operator defines plus a context.describe(view) that exposes the view’s agent-description. Whether MCP-native join is worth the complexity is open.
Streaming partial results. Tied to MCP spec evolution.