Inference placement & Incognito mode
Status: outline / RFC. Decision captured in ADR-0015. Open questions at the bottom.
Related specs: 09-access-control.md, 10-collocation.md, 11-compliance.md, 12-semantic-context.md, 13-team-surfaces.md, 16-formal-verification.md.
The data plane is local-first by construction. The inference plane — where the LLM that consumes Hakiri’s MCP responses actually runs — is not, and today users have to mentally track which model endpoint is hot before they ask a sensitive question. This spec defines:
- Inference zones — a typed label on where an agent’s model is executing (laptop / on-prem GPU / customer-private cloud / public LLM API).
- Data privacy classes — a typed label on tables and columns declaring which inference zones may consume them.
- Incognito mode — a client-side switch that asserts “this session’s inference is local/on-prem only,” surfaced into the capability-token subject so Hakiri can refuse to return cloud-disallowed context.
- Seamless flip — the user-facing experience: one toggle in the desktop app / one CLI flag, with the data-plane policy doing the actual enforcement.
This spec is policy-only: Hakiri does not host inference, does not embed an LLM gateway, does not call model SDKs. Anti-pillar “Not an LLM SDK or prompt framework” is preserved. What Hakiri owns is the declared zone of the caller and the declared zone allowed by the data — and refusing to serve when they don’t compose. See ADR-0015 for the decision and rejected alternatives.
Why this is a separate concern from access control
Section titled “Why this is a separate concern from access control”09-access-control.md answers who can read what. This spec answers where the bytes are allowed to be processed once they’re read. The two compose: a token can authorize the row read, but a row tagged inference_zone_allowed = ["local"] will not be returned in the same response if the requesting client’s asserted zone is cloud:anthropic.
The distinction matters because:
- A user may legitimately have read access to a customer’s PHI ticket and a cloud Claude API key in the same hour — but PHI must not leave the on-prem boundary even though the user is authorized.
- A team can pay for cloud inference for unrestricted tables (release notes, public OSS metadata) while routing private context (internal Slack, customer Stripe records) through a local Ollama or an on-prem vLLM box.
- Provider-agnostic posture (PRD Challenge 4) requires a story for “which provider for which task” that doesn’t bind to a specific vendor.
Inference zones
Section titled “Inference zones”A zone is a short string tag. The v0 vocabulary:
| Zone | What it means | Attestation in v0 |
|---|---|---|
local:device | LLM runs on the same machine as the agent client (Ollama, llama.cpp, LM Studio, mlx-lm). The bytes never leave the device. | Self-asserted by the agent client; hakiri.subject.inference_zone.asserted = true |
on-prem:<id> | LLM runs on a customer-controlled host inside the customer’s network boundary (vLLM on an internal GPU box, an internal LiteLLM gateway pointed at a self-hosted model). | Self-asserted in v0; host-attested via SPIFFE in M3+ |
private-cloud:<account> | LLM runs in the customer’s own cloud account against a single-tenant endpoint (AWS Bedrock with a customer-managed KMS key, GCP Vertex with PrivateLink, Anthropic via AWS PrivateLink). | Self-asserted; deploy-time tag matches the hakiri deploy <cloud> target |
public-cloud:<vendor> | LLM is a public multi-tenant API (Anthropic, OpenAI, Google AI Studio, Groq, …). | Self-asserted; vendor identifier is opaque to Hakiri |
unknown | Client did not declare a zone. | Always honored as the most restrictive match — treated as public-cloud:* for policy purposes |
The zone is a property of the client process making the MCP call, not of Hakiri itself. Hakiri only reads what the client asserts.
Why self-asserted is sufficient for v0
Section titled “Why self-asserted is sufficient for v0”Same honest read as 09-access-control.md § Subject attestation: a compromised client claims whatever it likes. What the policy buys is:
- Defense against accident, not compromise. A user who flipped Claude Code to “use Anthropic API” without realising they had a private-tagged context loaded gets refused at the MCP boundary instead of silently exfiltrating.
- Audit-trail honesty. Every read records the asserted zone. A later review answers “did anyone ever read this row under a
public-cloud:*claim?” in one OTel query. - Composability with M3+ attestation. When SPIFFE-flavored host attestation lands, the
inference_zone.assertedflag flips toinference_zone.verifiedwithout policy rewrites.
Privacy classes on data
Section titled “Privacy classes on data”Tables and columns declare which zones may consume them. The declaration is additive policy data in hakiri.toml, PR-reviewable, replicated through the manifest:
[[pipeline.tables]]name = "github_issues"# Default for unlabeled tables: any zone (`*`)inference_zone_allowed = ["*"]
[[pipeline.tables]]name = "stripe_customers"inference_zone_allowed = ["local:device", "on-prem:*", "private-cloud:*"]
# Per-column overrides for sensitive fields [pipeline.tables.policy.columns] email = { inference_zone_allowed = ["local:device", "on-prem:*"] } card_last4 = { inference_zone_allowed = ["local:device"] }
[[pipeline.tables]]name = "patient_visits"inference_zone_allowed = ["local:device", "on-prem:*"]pii_type = "phi" # composes with HIPAA tagging — see 11-compliance.mdComposition rules:
- Most-restrictive wins. If a table allows
["*"]and a projected column allows only["local:device"], that column is dropped (CLS-style) from results bound forpublic-cloud:*. - Unspecified means
["*"]. Hakiri does not assume sensitivity. Operators tag what’s sensitive; the rest flows freely. - PHI is auto-narrowed. Columns with
pii_type = "phi"get an implicit floor of["local:device", "on-prem:*"]per11-compliance.md. The validator refuses a manifest that widens this without an explicitphi_inference_override = trueflag.
The validator runs at manifest load and refuses configurations that combine redact = [...] with inference_zone_allowed = ["public-cloud:*"] for the same column — the two settings represent contradictory operator intent.
Incognito mode — the seamless flip
Section titled “Incognito mode — the seamless flip”Incognito mode is the user-visible expression of zone policy. One switch in the desktop app (and --incognito on the CLI; an incognito boolean on every MCP request payload) asserts:
For the duration of this session, the inference zone is restricted to
local:deviceandon-prem:*. Hakiri must refuse to serve any row whoseinference_zone_alloweddoes not include at least one of these.
What this gets the user:
- A way to ask “what’s the renewal risk on account X” against private Stripe + Salesforce context, knowing the bytes never cross to a cloud LLM API regardless of which model client is configured.
- A way to flip back out for an unrelated task (debugging an OSS dependency, drafting a public blog post) without re-configuring policy.
- A way for a team admin to declare Incognito as the default for everyone in the team (set in the team-mode control plane per
13-team-surfaces.md); individuals can disable per-session if their token grants the capability.
Surface
Section titled “Surface”# CLIhakiri mcp --incognito # stdio MCP server bound to local/on-prem zoneshakiri query --incognito "select …" # ad-hoc query, same policy applies
# hakiri.toml[agent]default_inference_zone = "local:device" # team-wide defaultallow_zone_widening = false # users cannot widen to public-cloud:*
# Per-request override on the MCP wire{ "tool": "context.query", "input": { ... }, "subject_overrides": { "inference_zone": "local:device", "incognito": true }}- A persistent indicator in the tray and titlebar shows the current zone (
local:device · Incognito,private-cloud:cf-prod,public-cloud:anthropic). - Switching zones requires a single click; switching out of Incognito while a private-tagged dataset is loaded raises a confirmation dialog citing the affected tables.
- The renderer feature-detects whether a local model endpoint is reachable (
http://localhost:11434for Ollama, configurable for vLLM/LM Studio) and refuses to enter Incognito if no local endpoint is healthy.
Enforcement at the MCP boundary
Section titled “Enforcement at the MCP boundary”Every MCP tool call carries (or inherits) a subject tuple. The runtime extends 09-access-control.md § Subject model with one field:
subject = { agent: "agent://claude-opus-4-7" host: "host://laptop-jarvis-01" on_behalf_of: "user://jarvis@fractalbox" task: "task://01HXYZ…" inference_zone: "local:device" // NEW — self-asserted incognito: true // NEW — convenience flag}Policy evaluation runs in this order on every context.query / context.execute_query response:
flowchart TB
Req[MCP request<br/>+ subject.inference_zone] --> Tok{Token authorizes<br/>row read?}
Tok -- no --> Deny[Deny with reason=token]
Tok -- yes --> Tag{Row table's<br/>inference_zone_allowed<br/>includes subject.zone?}
Tag -- no --> Drop[Drop row<br/>increment hakiri.policy.zone_filtered_rows]
Tag -- yes --> Col{Each projected column's<br/>inference_zone_allowed<br/>includes subject.zone?}
Col -- no --> Mask[Mask column<br/>add to hakiri.policy.zone_masked_columns]
Col -- yes --> Emit[Return cell]
The response envelope reports the filtering honestly, the same way 12-semantic-context.md § Policy-aware retrieval does:
{ "hakiri.policy.applied": { "rls_filtered_rows": 12, "cls_masked_columns": ["author.email"], "zone_filtered_rows": 4, // dropped for zone mismatch "zone_masked_columns": ["card_last4"], // masked for column-level zone mismatch "subject_inference_zone": "public-cloud:anthropic", "incognito": false }}The agent sees that something was filtered. The agent does not see the filtered content or learn whether a forbidden row would have matched semantically — vector and FTS retrieval drop forbidden rows before the top-K cut, same as RLS in 09-access-control.md. These two properties (no forbidden row reaches the response; metadata counters don’t reveal forbidden content) are stated and machine-checked as theorems serve_sound and metadata_aggregates in 16-formal-verification.md, with a Rust ↔ Lean differential test on every PR.
Audit trail
Section titled “Audit trail”Every MCP call emits an OTel span with:
| Attribute | Value | Use |
|---|---|---|
hakiri.subject.inference_zone | the asserted zone | ”did anyone ever read this row under public-cloud:*“ |
hakiri.subject.inference_zone.asserted | true in v0 | distinguishes assertion from attestation |
hakiri.subject.incognito | true / false | tracks intent independently of zone |
hakiri.policy.zone_filtered_rows | integer | how many rows were withheld |
hakiri.policy.zone_masked_columns | array | which columns were masked |
The append-only hash-chained audit log under .hakiri/audit/ (09-access-control.md § Audit trail durability) carries the same fields. Operators answering “have any of our private-tagged tables ever been served to a public-cloud zone?” run one query against the audit log; the substrate makes the answer cheap.
Composition with collocation
Section titled “Composition with collocation”10-collocation.md defines where the data is. This spec defines where the inference is. They compose:
- A
mode = "proxy"replica enforces zone policy at the proxy boundary — a sensitive table that cannot be replicated to a laptop is also not served to apublic-cloud:*zone from the proxy. - A
mode = "full"replica on a laptop can serve private-tagged data to alocal:devicezone but refuses to serve the same rows to apublic-cloud:*zone from the same laptop. - A query running on
private-cloud:<customer-aws>is allowed to consume data tagged for that zone; the deploy-time pin links Hakiri’s hosting target to the asserted inference zone for in-cloud agent flows.
What this spec deliberately leaves out
Section titled “What this spec deliberately leaves out”- LLM routing. Hakiri does not pick a model, build a prompt, or dispatch an inference call. Anti-pillar “Not an LLM SDK or prompt framework” is preserved. An optional
hakiri-llm-gatewaycomponent is a future possibility tracked as an open question; v0 ships none. - Cryptographic zone proof. v0 zone declarations are self-asserted, on the same honesty as
agent://andhost://. SPIFFE-flavored host attestation in M3+ is the upgrade path, not a v0 commitment. - Per-token wildcard semantics. A token granting
inference_zone = "*"is rejected at issuance — wildcards belong on the data side, not the subject side. Tokens must enumerate the zones they permit. - Inference cost accounting. Routing a workload between local and cloud has a cost shape; that’s the operator’s accounting problem, not Hakiri’s. The audit-log fields above are enough to drive an external cost-attribution report.
Open questions
Section titled “Open questions”- Default for unlabeled tables. v0 default is
["*"]— most-permissive, operator opts into restriction. Is “fail closed” (default["local:device", "on-prem:*"]) the safer default for the compliance-constrained audience? Leaning toward operator-choice in[agent] default_inference_zone_policy = "open" | "private"in the project’shakiri.toml. - Per-row zone tags. Today’s tags are per-table and per-column. A multi-tenant table where row A is public and row B is private would need a row-level tag column. Solvable via the same predicate machinery as RLS; out of scope for v0.
- Optional LLM gateway shim. A
hakiri-llm-gatewaycomponent (sidecar process, OpenAI-compatible API, routes by declared zone + task tag) would close the loop on the “seamless flip” UX. Keeping it out of v0 to honor the anti-pillar; revisit if user demand is overwhelming. If shipped, it would live as a separate crate / repo (like the agent example tree) — not in the data plane. - Zone for embedding calls. When
context.querytriggers an embedding for a semantic component, the embedding model itself is an inference call. If the operator runs a local embedding (e.g.fastembed-rs, per12-semantic-context.md§ Embedding model in the catalog), the embedding zone islocal:deviceregardless of the agent’s zone. Whether to expose this as a separateembedding_zonefield on the subject is open; v0 folds it into the table-levelinference_zone_allowed. - UX for “loaded private context.” The desktop confirmation when switching out of Incognito needs a definition of “currently loaded private context” — last N queries? Open documents? The MCP server is stateless, so this state lives in the client. Whether the control plane exposes a query-recent-zones API to drive the warning is open.