Skip to content

Inference placement & Incognito mode

Status: outline / RFC. Decision captured in ADR-0015. Open questions at the bottom.

Related specs: 09-access-control.md, 10-collocation.md, 11-compliance.md, 12-semantic-context.md, 13-team-surfaces.md, 16-formal-verification.md.

The data plane is local-first by construction. The inference plane — where the LLM that consumes Hakiri’s MCP responses actually runs — is not, and today users have to mentally track which model endpoint is hot before they ask a sensitive question. This spec defines:

  1. Inference zones — a typed label on where an agent’s model is executing (laptop / on-prem GPU / customer-private cloud / public LLM API).
  2. Data privacy classes — a typed label on tables and columns declaring which inference zones may consume them.
  3. Incognito mode — a client-side switch that asserts “this session’s inference is local/on-prem only,” surfaced into the capability-token subject so Hakiri can refuse to return cloud-disallowed context.
  4. Seamless flip — the user-facing experience: one toggle in the desktop app / one CLI flag, with the data-plane policy doing the actual enforcement.

This spec is policy-only: Hakiri does not host inference, does not embed an LLM gateway, does not call model SDKs. Anti-pillar “Not an LLM SDK or prompt framework” is preserved. What Hakiri owns is the declared zone of the caller and the declared zone allowed by the data — and refusing to serve when they don’t compose. See ADR-0015 for the decision and rejected alternatives.

Why this is a separate concern from access control

Section titled “Why this is a separate concern from access control”

09-access-control.md answers who can read what. This spec answers where the bytes are allowed to be processed once they’re read. The two compose: a token can authorize the row read, but a row tagged inference_zone_allowed = ["local"] will not be returned in the same response if the requesting client’s asserted zone is cloud:anthropic.

The distinction matters because:

  • A user may legitimately have read access to a customer’s PHI ticket and a cloud Claude API key in the same hour — but PHI must not leave the on-prem boundary even though the user is authorized.
  • A team can pay for cloud inference for unrestricted tables (release notes, public OSS metadata) while routing private context (internal Slack, customer Stripe records) through a local Ollama or an on-prem vLLM box.
  • Provider-agnostic posture (PRD Challenge 4) requires a story for “which provider for which task” that doesn’t bind to a specific vendor.

A zone is a short string tag. The v0 vocabulary:

ZoneWhat it meansAttestation in v0
local:deviceLLM runs on the same machine as the agent client (Ollama, llama.cpp, LM Studio, mlx-lm). The bytes never leave the device.Self-asserted by the agent client; hakiri.subject.inference_zone.asserted = true
on-prem:<id>LLM runs on a customer-controlled host inside the customer’s network boundary (vLLM on an internal GPU box, an internal LiteLLM gateway pointed at a self-hosted model).Self-asserted in v0; host-attested via SPIFFE in M3+
private-cloud:<account>LLM runs in the customer’s own cloud account against a single-tenant endpoint (AWS Bedrock with a customer-managed KMS key, GCP Vertex with PrivateLink, Anthropic via AWS PrivateLink).Self-asserted; deploy-time tag matches the hakiri deploy <cloud> target
public-cloud:<vendor>LLM is a public multi-tenant API (Anthropic, OpenAI, Google AI Studio, Groq, …).Self-asserted; vendor identifier is opaque to Hakiri
unknownClient did not declare a zone.Always honored as the most restrictive match — treated as public-cloud:* for policy purposes

The zone is a property of the client process making the MCP call, not of Hakiri itself. Hakiri only reads what the client asserts.

Same honest read as 09-access-control.md § Subject attestation: a compromised client claims whatever it likes. What the policy buys is:

  • Defense against accident, not compromise. A user who flipped Claude Code to “use Anthropic API” without realising they had a private-tagged context loaded gets refused at the MCP boundary instead of silently exfiltrating.
  • Audit-trail honesty. Every read records the asserted zone. A later review answers “did anyone ever read this row under a public-cloud:* claim?” in one OTel query.
  • Composability with M3+ attestation. When SPIFFE-flavored host attestation lands, the inference_zone.asserted flag flips to inference_zone.verified without policy rewrites.

Tables and columns declare which zones may consume them. The declaration is additive policy data in hakiri.toml, PR-reviewable, replicated through the manifest:

[[pipeline.tables]]
name = "github_issues"
# Default for unlabeled tables: any zone (`*`)
inference_zone_allowed = ["*"]
[[pipeline.tables]]
name = "stripe_customers"
inference_zone_allowed = ["local:device", "on-prem:*", "private-cloud:*"]
# Per-column overrides for sensitive fields
[pipeline.tables.policy.columns]
email = { inference_zone_allowed = ["local:device", "on-prem:*"] }
card_last4 = { inference_zone_allowed = ["local:device"] }
[[pipeline.tables]]
name = "patient_visits"
inference_zone_allowed = ["local:device", "on-prem:*"]
pii_type = "phi" # composes with HIPAA tagging — see 11-compliance.md

Composition rules:

  • Most-restrictive wins. If a table allows ["*"] and a projected column allows only ["local:device"], that column is dropped (CLS-style) from results bound for public-cloud:*.
  • Unspecified means ["*"]. Hakiri does not assume sensitivity. Operators tag what’s sensitive; the rest flows freely.
  • PHI is auto-narrowed. Columns with pii_type = "phi" get an implicit floor of ["local:device", "on-prem:*"] per 11-compliance.md. The validator refuses a manifest that widens this without an explicit phi_inference_override = true flag.

The validator runs at manifest load and refuses configurations that combine redact = [...] with inference_zone_allowed = ["public-cloud:*"] for the same column — the two settings represent contradictory operator intent.

Incognito mode is the user-visible expression of zone policy. One switch in the desktop app (and --incognito on the CLI; an incognito boolean on every MCP request payload) asserts:

For the duration of this session, the inference zone is restricted to local:device and on-prem:*. Hakiri must refuse to serve any row whose inference_zone_allowed does not include at least one of these.

What this gets the user:

  • A way to ask “what’s the renewal risk on account X” against private Stripe + Salesforce context, knowing the bytes never cross to a cloud LLM API regardless of which model client is configured.
  • A way to flip back out for an unrelated task (debugging an OSS dependency, drafting a public blog post) without re-configuring policy.
  • A way for a team admin to declare Incognito as the default for everyone in the team (set in the team-mode control plane per 13-team-surfaces.md); individuals can disable per-session if their token grants the capability.
Terminal window
# CLI
hakiri mcp --incognito # stdio MCP server bound to local/on-prem zones
hakiri query --incognito "select …" # ad-hoc query, same policy applies
# hakiri.toml
[agent]
default_inference_zone = "local:device" # team-wide default
allow_zone_widening = false # users cannot widen to public-cloud:*
# Per-request override on the MCP wire
{
"tool": "context.query",
"input": { ... },
"subject_overrides": { "inference_zone": "local:device", "incognito": true }
}
  • A persistent indicator in the tray and titlebar shows the current zone (local:device · Incognito, private-cloud:cf-prod, public-cloud:anthropic).
  • Switching zones requires a single click; switching out of Incognito while a private-tagged dataset is loaded raises a confirmation dialog citing the affected tables.
  • The renderer feature-detects whether a local model endpoint is reachable (http://localhost:11434 for Ollama, configurable for vLLM/LM Studio) and refuses to enter Incognito if no local endpoint is healthy.

Every MCP tool call carries (or inherits) a subject tuple. The runtime extends 09-access-control.md § Subject model with one field:

subject = {
agent: "agent://claude-opus-4-7"
host: "host://laptop-jarvis-01"
on_behalf_of: "user://jarvis@fractalbox"
task: "task://01HXYZ…"
inference_zone: "local:device" // NEW — self-asserted
incognito: true // NEW — convenience flag
}

Policy evaluation runs in this order on every context.query / context.execute_query response:

flowchart TB
  Req[MCP request<br/>+ subject.inference_zone] --> Tok{Token authorizes<br/>row read?}
  Tok -- no --> Deny[Deny with reason=token]
  Tok -- yes --> Tag{Row table's<br/>inference_zone_allowed<br/>includes subject.zone?}
  Tag -- no --> Drop[Drop row<br/>increment hakiri.policy.zone_filtered_rows]
  Tag -- yes --> Col{Each projected column's<br/>inference_zone_allowed<br/>includes subject.zone?}
  Col -- no --> Mask[Mask column<br/>add to hakiri.policy.zone_masked_columns]
  Col -- yes --> Emit[Return cell]

The response envelope reports the filtering honestly, the same way 12-semantic-context.md § Policy-aware retrieval does:

{
"hakiri.policy.applied": {
"rls_filtered_rows": 12,
"cls_masked_columns": ["author.email"],
"zone_filtered_rows": 4, // dropped for zone mismatch
"zone_masked_columns": ["card_last4"], // masked for column-level zone mismatch
"subject_inference_zone": "public-cloud:anthropic",
"incognito": false
}
}

The agent sees that something was filtered. The agent does not see the filtered content or learn whether a forbidden row would have matched semantically — vector and FTS retrieval drop forbidden rows before the top-K cut, same as RLS in 09-access-control.md. These two properties (no forbidden row reaches the response; metadata counters don’t reveal forbidden content) are stated and machine-checked as theorems serve_sound and metadata_aggregates in 16-formal-verification.md, with a Rust ↔ Lean differential test on every PR.

Every MCP call emits an OTel span with:

AttributeValueUse
hakiri.subject.inference_zonethe asserted zone”did anyone ever read this row under public-cloud:*“
hakiri.subject.inference_zone.assertedtrue in v0distinguishes assertion from attestation
hakiri.subject.incognitotrue / falsetracks intent independently of zone
hakiri.policy.zone_filtered_rowsintegerhow many rows were withheld
hakiri.policy.zone_masked_columnsarraywhich columns were masked

The append-only hash-chained audit log under .hakiri/audit/ (09-access-control.md § Audit trail durability) carries the same fields. Operators answering “have any of our private-tagged tables ever been served to a public-cloud zone?” run one query against the audit log; the substrate makes the answer cheap.

10-collocation.md defines where the data is. This spec defines where the inference is. They compose:

  • A mode = "proxy" replica enforces zone policy at the proxy boundary — a sensitive table that cannot be replicated to a laptop is also not served to a public-cloud:* zone from the proxy.
  • A mode = "full" replica on a laptop can serve private-tagged data to a local:device zone but refuses to serve the same rows to a public-cloud:* zone from the same laptop.
  • A query running on private-cloud:<customer-aws> is allowed to consume data tagged for that zone; the deploy-time pin links Hakiri’s hosting target to the asserted inference zone for in-cloud agent flows.
  • LLM routing. Hakiri does not pick a model, build a prompt, or dispatch an inference call. Anti-pillar “Not an LLM SDK or prompt framework” is preserved. An optional hakiri-llm-gateway component is a future possibility tracked as an open question; v0 ships none.
  • Cryptographic zone proof. v0 zone declarations are self-asserted, on the same honesty as agent:// and host://. SPIFFE-flavored host attestation in M3+ is the upgrade path, not a v0 commitment.
  • Per-token wildcard semantics. A token granting inference_zone = "*" is rejected at issuance — wildcards belong on the data side, not the subject side. Tokens must enumerate the zones they permit.
  • Inference cost accounting. Routing a workload between local and cloud has a cost shape; that’s the operator’s accounting problem, not Hakiri’s. The audit-log fields above are enough to drive an external cost-attribution report.
  • Default for unlabeled tables. v0 default is ["*"] — most-permissive, operator opts into restriction. Is “fail closed” (default ["local:device", "on-prem:*"]) the safer default for the compliance-constrained audience? Leaning toward operator-choice in [agent] default_inference_zone_policy = "open" | "private" in the project’s hakiri.toml.
  • Per-row zone tags. Today’s tags are per-table and per-column. A multi-tenant table where row A is public and row B is private would need a row-level tag column. Solvable via the same predicate machinery as RLS; out of scope for v0.
  • Optional LLM gateway shim. A hakiri-llm-gateway component (sidecar process, OpenAI-compatible API, routes by declared zone + task tag) would close the loop on the “seamless flip” UX. Keeping it out of v0 to honor the anti-pillar; revisit if user demand is overwhelming. If shipped, it would live as a separate crate / repo (like the agent example tree) — not in the data plane.
  • Zone for embedding calls. When context.query triggers an embedding for a semantic component, the embedding model itself is an inference call. If the operator runs a local embedding (e.g. fastembed-rs, per 12-semantic-context.md § Embedding model in the catalog), the embedding zone is local:device regardless of the agent’s zone. Whether to expose this as a separate embedding_zone field on the subject is open; v0 folds it into the table-level inference_zone_allowed.
  • UX for “loaded private context.” The desktop confirmation when switching out of Incognito needs a definition of “currently loaded private context” — last N queries? Open documents? The MCP server is stateless, so this state lives in the client. Whether the control plane exposes a query-recent-zones API to drive the warning is open.