Collocation: indexed context wherever the agent runs

The compute-and-data-collocation property from PRD Pillar 3’s “MCP-native context store” — agents run in many places (laptop, edge Worker, Fargate task, on-prem VM, air-gapped enclave), and the context layer brings indexed slices next to wherever they land. Round-tripping every prompt to us-east-1 is broken for offline use, broken for sovereign deploys, and a 50–200ms tax even when it works.

Storage layout + sidecar indexes: 04-context-store.md.
Access control across replicas: 09-access-control.md.
The compliance dimension: 11-compliance.md.

Three collocation shapes

Shape	Replica location	p99 read latency	Refresh path
Agent-local	Same machine as the agent (laptop SSD, on-prem box)	Sub-ms	Periodic `hakiri sync pull` over WAN
Edge-local	Replica in object storage in the same region as the edge runtime (CF Worker + R2 same region, Lambda + S3 same region)	Single-digit ms	Push-driven; replica updates as snapshots commit
Region-pinned	Replica in the same VPC as the agent (Fargate task + EFS, EC2 + S3 same region)	Single-digit ms; no egress	Same as edge-local

The same hakiri.toml produces all three shapes — the difference is where hakiri sync pull --mode replica materializes the snapshot. Pillar 1’s single binary makes this uniform; Pillar 3’s cat-able store makes the replica itself a tar you can move around.

Replica modes

[sync.replica]
mode = "full"                                # full | partial | proxy
tables = ["github_issues", "linear_issues"]  # which tables to materialize locally
indexes = ["fts-body", "vec-body-bge-large-en-v1.5"]  # which sidecars to pull
partitions = ["recent_90d"]                   # only these partitions (see § Sharding)

`mode = "full"`

Pulls every snapshot and every declared index for the listed tables. Best for laptops doing serious agent work; worst for resource-constrained edges.

`mode = "partial"`

Pulls only the partitions and indexes the replica declares. A laptop can pull the FTS index but skip the 4 GB HNSW; a Worker can pull only the recent_90d partition. The runtime refuses to start an agent query that requires an index or partition the replica doesn’t have — it suggests hakiri sync pull --add-index <id> instead of silently degrading.

`mode = "proxy"`

The replica is empty locally; the agent’s context.query calls round-trip to the canonical store (via the hakiri sync serve query proxy). Used when:

The table is too large to replicate (multi-TB CDC).
The table is too sensitive to replicate (replicate = false on the source table).
The replica’s footprint budget can’t hold the data.

The proxy enforces capability-token policy (09-access-control.md) on every query — the agent never sees raw rows it isn’t authorized for. Latency is the cost.

Sharding by access pattern

A table can declare an access_pattern so the writer partitions Parquet in a shape that lets replicas pull just the slice they need:

[[pipeline.tables]]
name = "github_issues"
access_pattern = "by_repo"        # one partition per repo
# or
access_pattern = "recent_90d"     # rolling window; older data goes to "archive" partition
# or
access_pattern = "by_account"     # one partition per tenant

Replicas declare the partitions they want:

[sync.replica]
tables = ["github_issues"]
partitions = ["repo=torvalds/linux", "repo=openhackersclub/gctrl"]

A research agent that only ever queries one repo doesn’t drag the rest across the wire. The catalog’s per-partition cursors mean incremental refresh is per-partition, not per-table — a partition the replica doesn’t subscribe to never enters its refresh budget.

Incremental refresh

Refresh is a diff, not a re-sync. The replica’s local catalog tracks per-snapshot content hashes. A pull:

Fetches the remote top-level manifest (a few KB).
Diffs against local content hashes.
Downloads only changed snapshot directories (Parquet + sidecars together; see § Atomic snapshot + sidecar commit in 04-context-store.md).
Atomically swaps the current snapshot pointer in the local catalog.
Old snapshots stay around for the retention window.

A laptop offline for a week comes back, pulls one new snapshot per affected table (plus its sidecars), and is current in seconds. The replica never re-downloads unchanged Parquet.

Replicate vs proxy — when to choose which

Per-table choice driven by data sensitivity, size, and policy:

Choose `replicate` when…	Choose `proxy` when…
Table is < ~10 GB	Table is > 100 GB
Data is not subject to per-replica policy variance	Data has tenant-scoped or region-scoped access policies
Replica is in a trusted environment (operator-controlled)	Replica is on a laptop or third-party-controlled host with weaker trust
Compaction can keep snapshots small enough to refresh on bandwidth	Refresh-bandwidth or storage-on-replica is constrained
Latency budget requires local reads	Latency budget tolerates a hop

For tables with pii_type columns (declared in the manifest), replicate-by-default is off — write-time redaction strips the PII columns before Parquet hits the bucket; the replica gets the redacted Parquet. If a replica needs PII access, it uses proxy mode and the proxy enforces the token policy.

Provenance through replication

Every replica carries the lineage edges (which run, which connector, which agent authored the connector). An agent querying a replica sees the same provenance an operator would see at the source. The lineage table is part of the catalog and replicates with it; it is not a separate concern.

Query-proxy security model

When a replica runs in proxy mode, the proxy (hakiri sync serve running near the canonical store):

Verifies the requesting client’s capability token (biscuit), including cnf.jkt proof-of-possession.
Plans the agent’s context.query against the canonical Parquet + indexes.
Applies RLS / CLS / k-anonymity per the token’s grants (see 09-access-control.md).
Returns only the projected rows, never raw Parquet handles.
Emits an OTel audit span + appends to the local hash-chained audit log.

The agent never sees raw bytes it isn’t authorized for. The proxy is the trust boundary; everything past it is the agent’s runtime.

What this spec deliberately leaves out

Active-active multi-writer replicas. v0 replicas are read-only (pull-side); writes go through the canonical store. Active-active is a v2 concern that needs CRDT-shaped conflict resolution — out of scope per ADR-0005.
Replica-side recompaction. Replicas pull pre-compacted snapshots; they don’t compact locally. If a replica’s local needs differ (smaller HNSW, different partitioning), they declare it as a partial-mode replica with explicit choices, not silent local recompaction.
CDN-fronted replicas. Object-store + CDN (R2 free egress, CloudFront for S3) is the operator’s choice; Hakiri doesn’t manage it.

Open questions

Manifest-format for partial replicas. A partial replica advertises “I hold partitions X, Y, indexes A, B” — should this be in the catalog or in a separate replica.toml per replica? Leaning catalog so it’s queryable centrally.
Refresh prioritization. When a replica is bandwidth-constrained, which snapshot should refresh first? “Most recently queried table” is one heuristic; “freshly committed snapshot” is another. M2 ships LRU + freshness combo; revisit with usage data.
Replica trust attestation. A laptop replica claims to be in a particular policy zone; what verifies it? Tied to the Subject attestation story — host attestation v0 is self-asserted, M3+ is SPIFFE-flavored.