Compliance posture

The data-sovereignty / compliance dimension from the PRD’s “Properties that follow”. This spec is the substrate — the architectural floor that makes attestation possible. It is not itself a regulatory attestation. Attestation is the work of the customer plus the commercial layer at M3+ (PRD § Commercial layer).

Access control mechanics: 09-access-control.md.
Encryption at rest and sidecar encryption: 04-context-store.md § Encryption.
Subject attestation: 09-access-control.md § Attestation.
Audit log: 09-access-control.md § Audit trail.

Compliance reality-check: what v0 actually provides vs. what it doesn’t

Regime	Hakiri provides (substrate)	Hakiri does not provide (operator’s work or commercial-layer work)
GDPR data-residency	Customer-controlled cloud region for first-class deploys (CF + AWS EU); air-gapped on-prem via Topology 2.5; no Hakiri-operated data path	Documented region pinning per customer; legal data-processing agreement (DPA)
GDPR right-to-erasure	Per-row `_subject_id` lineage tagging; `hakiri context forget --subject <id>` rewrites affected snapshots without the subject	Operator must declare which column is the subject identifier; M2 ships the forget tool — until then erasure is not supported
HIPAA — technical safeguards	Encryption at rest (Parquet modular encryption), encryption in transit (TLS), capability-token access control, OTel audit + hash-chained local audit, secrets-via-sandbox-only	Business Associate Agreement (BAA) — requires a commercial entity at M3.5+; risk assessment; access-review processes; incident-response runbook
HIPAA — PHI handling	Write-time redaction of declared `pii_type = "phi"` columns; “PHI must never leave the kernel unmasked” invariant via column tagging	Operator declares which columns are PHI; clinical workflow validation
EU AI Act	Provenance edges from row → connector → connector-author (agent or human); capability-token audit; documented agent reads	High-risk-system classification; human-oversight processes; risk-management documentation
SOC 2	Manifest-as-code with PR review; immutable lineage table; OTel + chained audit; declared change-management surface in `pm/roadmap.md`	SOC 2 Type II audit on the commercial hosted control plane (M3.5+); change-management process; incident-response process
PCI DSS	TLS, encryption at rest, capability-token scoping, audit trail	Cardholder-data scoping (operator’s call); compensating-controls documentation; quarterly scan
FedRAMP	Air-gapped deployment, no telemetry-by-default, sovereign cloud regions	FedRAMP authorization on a specific deploy — not on Hakiri itself

Honest read: v0 ships the substrate a regulated buyer’s compliance team needs to attest against. It does not ship the attestation. M3.5+ commercial tier is where attestations land (SOC 2 Type II on the hosted control plane; BAA-eligible variant for HIPAA customers; documented data-residency whitepaper).

Data-residency posture

Data plane runs entirely in the customer’s environment. Always. There is no Hakiri-operated data path.
First-class clouds (Cloudflare, AWS) both offer EU regions; the hakiri deploy <cloud> command takes --region and pins all resources to that region. The runtime fails to start if a configured resource is in a different region than declared.
On-prem / air-gapped via Topology 2.5 (self-hosted cluster with bundled hakiri coord) — no public-internet path required.
Optional M3 hosted control plane stores only manifests and schedules — never data. Customer can run an in-region instance of the control plane or rely on the OHC-affiliated hosted instance per their residency obligations.

Encryption posture

See 04-context-store.md § Encryption for the mechanics. Summary for this spec:

Parquet modular encryption with operator-supplied KMS keys (AWS KMS, GCP KMS, HashiCorp Vault, CF Workers Secrets, OS keychain).
Sidecar indexes (HNSW, Tantivy, Bloom) encrypted under the same key. Indexes over redacted columns do not exist on disk.
TLS in transit, non-negotiable for cloud sync backends.
Key rotation: dual-key acceptance for signing keys (24h overlap); per-snapshot key versioning for encryption keys.

Recommended rotation cadence

Key	Recommended rotation	Mandatory for
Project signing key (token verification)	90 days	HIPAA, SOC 2
Parquet encryption key (KMS-held)	365 days (KMS-managed); per-incident-suspicion immediate	HIPAA, PCI DSS
Clean-room pair pepper	Per clean-room session	All
Sync bucket credentials	90 days	SOC 2

hakiri keys status reports what’s due and how to rotate.

Right-to-erasure under append-only Parquet

GDPR Art. 17 (right to erasure) is non-trivial against an append-only store. The architectural answer:

Operator declares the subject identifier column in the manifest:

[[pipeline.tables]]
name = "customer_events"
subject_id = "user_id"

The catalog maintains a forget_requests(subject_id, requested_at, completed_at) table.
hakiri context forget --subject <id> triggers a forced compaction that rewrites every affected snapshot without the subject’s rows. The old snapshot’s runs are GC’d on an expedited schedule (≤24h vs the default 7d retention).
The forget operation is itself audited — an OTel span and chained-audit entry records what was forgotten, when, and by whom, with a hash of the forgotten subject id (not the cleartext) so the audit trail itself remains lawful under GDPR.
Replicas pick up the new snapshot on next refresh; old snapshots GC after the retention window. Until refresh completes, replicas may still hold the subject’s data. The right-to-erasure SLA in v0 is “≤72h from request to last replica refresh” — operators with tighter SLAs use proxy-mode replicas or force-refresh.

Lands in M2.

PHI / sensitive-column tagging

Declared pii_type on a column makes the column subject to extra rules:

[[pipeline.tables]]
name = "patient_visits"

  [pipeline.tables.policy.columns]
  patient_id   = { pii_type = "phi", strategy = "tokenize" }
  visit_notes  = { pii_type = "phi", strategy = "redact" }
  patient_name = { pii_type = "phi", strategy = "redact" }
  visit_date   = { pii_type = "phi", strategy = "bucket:1m" }

The runtime enforces:

No index exists on a column with strategy = "redact" and pii_type set (manifest validator refuses).
No hash masking alone on pii_type ∈ {ssn, phone, email, mrn} — must be combined with bucket or truncate (09-access-control.md § Hash strategy guardrails).
No retrieval through MCP of unmasked pii_type columns unless the requesting token carries an explicit phi_access = true grant.
Audit attribute hakiri.row.phi_columns_returned is logged on every read that returned PHI, for HIPAA accounting-of-disclosures.
Inference-zone floor. Columns tagged pii_type = "phi" get an implicit inference_zone_allowed = ["local:device", "on-prem:*"] floor. The validator refuses a manifest that widens this without an explicit phi_inference_override = true flag. The mechanics — including the Incognito-mode UX customer-facing teams flip when handling PHI — live in 15-inference-placement.md.

Audit trail durability

Per 09-access-control.md § Tamper-evident audit log:

OTel spans are the queryable projection (operator-configured sink — Honeycomb, Tempo, Grafana Cloud, self-hosted).
A parallel append-only hash-chained log under .hakiri/audit/<project>/ is the attestable record.
Signed roots are committed to the sync bucket on a configurable cadence (default every 10 min) so audit history survives a compromised local node.
Optional: commit signed roots to an external transparency log (Sigstore Rekor) for operator-tamper-resistant audit.

If the audit-write path fails (disk full, permission error, bucket unreachable), reads are refused. No fail-open path returns rows without an audit entry.

Telemetry posture

Hakiri does not phone home. No telemetry-by-default; no anonymous usage ping; no auto-update probe; no license-server contact. The binary works fully air-gapped.

OTel export is opt-in and operator-controlled: the operator configures the endpoint, sampling rate, and attributes. The default OTEL_EXPORTER_OTLP_ENDPOINT is unset — the runtime emits spans to a no-op sink until the operator points them somewhere.

A weekly auto-update check (HTTP HEAD against a release feed) is off by default in v0 and opt-in via [update] check = true. The check never sends usage data; it only fetches a manifest of recent releases. Sovereign deploys leave it off.

What this spec deliberately leaves out

Specific attestations (SOC 2 Type II reports, BAA templates, FedRAMP authorization documents). Those are deliverables of the commercial-layer entity, not the OSS data plane.
Customer-side compliance processes (access reviews, risk assessments, incident response). Hakiri provides the substrate; the customer’s compliance team owns the process.
Region-specific certifications beyond EU. APAC sovereignty (China, India), specific public-sector frameworks (UK G-Cloud, AU IRAP, Canada PBMM) — supported by the architecture but not formally attested in v0.

Open questions

Commercial entity identification. Which entity holds the BAA, signs the DPA, undergoes the SOC 2 audit? FractalBox, an OHC-affiliated entity, or a separate commercial vehicle? Tracked in PRD § Open product questions.
Audit log to transparency-log integration. Sigstore Rekor is the obvious choice but adds an external dependency. Is the value worth the dependency? Probably yes for HIPAA / SOC 2 customers; off by default for everyone else.
EU AI Act high-risk-system classification. Does running an agent over customer data classify the customer’s deployment as a high-risk system, or only when the agent makes automated decisions? Tracking the EU Act’s regulatory guidance; spec evolves with the guidance.
Right-to-erasure SLA. Default ≤72h to last-replica refresh is the v0 commitment. Tighter SLAs require proxy-mode replicas. Whether to support an explicit “erasure pending” replica state where the replica reports the gap is an M3 question.
Cross-tenant clean-room compliance. When two parties share a clean-room deployment, who is the data controller / processor for each party’s data? 09-access-control.md § Multi-tenant clean rooms covers the security model; the legal model is a per-deployment contract concern.