Skip to content

Compliance posture

The data-sovereignty / compliance dimension from the PRD’s “Properties that follow”. This spec is the substrate — the architectural floor that makes attestation possible. It is not itself a regulatory attestation. Attestation is the work of the customer plus the commercial layer at M3+ (PRD § Commercial layer).

Related:

Compliance reality-check: what v0 actually provides vs. what it doesn’t

Section titled “Compliance reality-check: what v0 actually provides vs. what it doesn’t”
RegimeHakiri provides (substrate)Hakiri does not provide (operator’s work or commercial-layer work)
GDPR data-residencyCustomer-controlled cloud region for first-class deploys (CF + AWS EU); air-gapped on-prem via Topology 2.5; no Hakiri-operated data pathDocumented region pinning per customer; legal data-processing agreement (DPA)
GDPR right-to-erasurePer-row _subject_id lineage tagging; hakiri context forget --subject <id> rewrites affected snapshots without the subjectOperator must declare which column is the subject identifier; M2 ships the forget tool — until then erasure is not supported
HIPAA — technical safeguardsEncryption at rest (Parquet modular encryption), encryption in transit (TLS), capability-token access control, OTel audit + hash-chained local audit, secrets-via-sandbox-onlyBusiness Associate Agreement (BAA) — requires a commercial entity at M3.5+; risk assessment; access-review processes; incident-response runbook
HIPAA — PHI handlingWrite-time redaction of declared pii_type = "phi" columns; “PHI must never leave the kernel unmasked” invariant via column taggingOperator declares which columns are PHI; clinical workflow validation
EU AI ActProvenance edges from row → connector → connector-author (agent or human); capability-token audit; documented agent readsHigh-risk-system classification; human-oversight processes; risk-management documentation
SOC 2Manifest-as-code with PR review; immutable lineage table; OTel + chained audit; declared change-management surface in pm/roadmap.mdSOC 2 Type II audit on the commercial hosted control plane (M3.5+); change-management process; incident-response process
PCI DSSTLS, encryption at rest, capability-token scoping, audit trailCardholder-data scoping (operator’s call); compensating-controls documentation; quarterly scan
FedRAMPAir-gapped deployment, no telemetry-by-default, sovereign cloud regionsFedRAMP authorization on a specific deploy — not on Hakiri itself

Honest read: v0 ships the substrate a regulated buyer’s compliance team needs to attest against. It does not ship the attestation. M3.5+ commercial tier is where attestations land (SOC 2 Type II on the hosted control plane; BAA-eligible variant for HIPAA customers; documented data-residency whitepaper).

  • Data plane runs entirely in the customer’s environment. Always. There is no Hakiri-operated data path.
  • First-class clouds (Cloudflare, AWS) both offer EU regions; the hakiri deploy <cloud> command takes --region and pins all resources to that region. The runtime fails to start if a configured resource is in a different region than declared.
  • On-prem / air-gapped via Topology 2.5 (self-hosted cluster with bundled hakiri coord) — no public-internet path required.
  • Optional M3 hosted control plane stores only manifests and schedules — never data. Customer can run an in-region instance of the control plane or rely on the OHC-affiliated hosted instance per their residency obligations.

See 04-context-store.md § Encryption for the mechanics. Summary for this spec:

  • Parquet modular encryption with operator-supplied KMS keys (AWS KMS, GCP KMS, HashiCorp Vault, CF Workers Secrets, OS keychain).
  • Sidecar indexes (HNSW, Tantivy, Bloom) encrypted under the same key. Indexes over redacted columns do not exist on disk.
  • TLS in transit, non-negotiable for cloud sync backends.
  • Key rotation: dual-key acceptance for signing keys (24h overlap); per-snapshot key versioning for encryption keys.
KeyRecommended rotationMandatory for
Project signing key (token verification)90 daysHIPAA, SOC 2
Parquet encryption key (KMS-held)365 days (KMS-managed); per-incident-suspicion immediateHIPAA, PCI DSS
Clean-room pair pepperPer clean-room sessionAll
Sync bucket credentials90 daysSOC 2

hakiri keys status reports what’s due and how to rotate.

Right-to-erasure under append-only Parquet

Section titled “Right-to-erasure under append-only Parquet”

GDPR Art. 17 (right to erasure) is non-trivial against an append-only store. The architectural answer:

  1. Operator declares the subject identifier column in the manifest:

    [[pipeline.tables]]
    name = "customer_events"
    subject_id = "user_id"
  2. The catalog maintains a forget_requests(subject_id, requested_at, completed_at) table.

  3. hakiri context forget --subject <id> triggers a forced compaction that rewrites every affected snapshot without the subject’s rows. The old snapshot’s runs are GC’d on an expedited schedule (≤24h vs the default 7d retention).

  4. The forget operation is itself audited — an OTel span and chained-audit entry records what was forgotten, when, and by whom, with a hash of the forgotten subject id (not the cleartext) so the audit trail itself remains lawful under GDPR.

  5. Replicas pick up the new snapshot on next refresh; old snapshots GC after the retention window. Until refresh completes, replicas may still hold the subject’s data. The right-to-erasure SLA in v0 is “≤72h from request to last replica refresh” — operators with tighter SLAs use proxy-mode replicas or force-refresh.

Lands in M2.

Declared pii_type on a column makes the column subject to extra rules:

[[pipeline.tables]]
name = "patient_visits"
[pipeline.tables.policy.columns]
patient_id = { pii_type = "phi", strategy = "tokenize" }
visit_notes = { pii_type = "phi", strategy = "redact" }
patient_name = { pii_type = "phi", strategy = "redact" }
visit_date = { pii_type = "phi", strategy = "bucket:1m" }

The runtime enforces:

  • No index exists on a column with strategy = "redact" and pii_type set (manifest validator refuses).
  • No hash masking alone on pii_type ∈ {ssn, phone, email, mrn} — must be combined with bucket or truncate (09-access-control.md § Hash strategy guardrails).
  • No retrieval through MCP of unmasked pii_type columns unless the requesting token carries an explicit phi_access = true grant.
  • Audit attribute hakiri.row.phi_columns_returned is logged on every read that returned PHI, for HIPAA accounting-of-disclosures.
  • Inference-zone floor. Columns tagged pii_type = "phi" get an implicit inference_zone_allowed = ["local:device", "on-prem:*"] floor. The validator refuses a manifest that widens this without an explicit phi_inference_override = true flag. The mechanics — including the Incognito-mode UX customer-facing teams flip when handling PHI — live in 15-inference-placement.md.

Per 09-access-control.md § Tamper-evident audit log:

  • OTel spans are the queryable projection (operator-configured sink — Honeycomb, Tempo, Grafana Cloud, self-hosted).
  • A parallel append-only hash-chained log under .hakiri/audit/<project>/ is the attestable record.
  • Signed roots are committed to the sync bucket on a configurable cadence (default every 10 min) so audit history survives a compromised local node.
  • Optional: commit signed roots to an external transparency log (Sigstore Rekor) for operator-tamper-resistant audit.

If the audit-write path fails (disk full, permission error, bucket unreachable), reads are refused. No fail-open path returns rows without an audit entry.

Hakiri does not phone home. No telemetry-by-default; no anonymous usage ping; no auto-update probe; no license-server contact. The binary works fully air-gapped.

OTel export is opt-in and operator-controlled: the operator configures the endpoint, sampling rate, and attributes. The default OTEL_EXPORTER_OTLP_ENDPOINT is unset — the runtime emits spans to a no-op sink until the operator points them somewhere.

A weekly auto-update check (HTTP HEAD against a release feed) is off by default in v0 and opt-in via [update] check = true. The check never sends usage data; it only fetches a manifest of recent releases. Sovereign deploys leave it off.

  • Specific attestations (SOC 2 Type II reports, BAA templates, FedRAMP authorization documents). Those are deliverables of the commercial-layer entity, not the OSS data plane.
  • Customer-side compliance processes (access reviews, risk assessments, incident response). Hakiri provides the substrate; the customer’s compliance team owns the process.
  • Region-specific certifications beyond EU. APAC sovereignty (China, India), specific public-sector frameworks (UK G-Cloud, AU IRAP, Canada PBMM) — supported by the architecture but not formally attested in v0.
  • Commercial entity identification. Which entity holds the BAA, signs the DPA, undergoes the SOC 2 audit? FractalBox, an OHC-affiliated entity, or a separate commercial vehicle? Tracked in PRD § Open product questions.
  • Audit log to transparency-log integration. Sigstore Rekor is the obvious choice but adds an external dependency. Is the value worth the dependency? Probably yes for HIPAA / SOC 2 customers; off by default for everyone else.
  • EU AI Act high-risk-system classification. Does running an agent over customer data classify the customer’s deployment as a high-risk system, or only when the agent makes automated decisions? Tracking the EU Act’s regulatory guidance; spec evolves with the guidance.
  • Right-to-erasure SLA. Default ≤72h to last-replica refresh is the v0 commitment. Tighter SLAs require proxy-mode replicas. Whether to support an explicit “erasure pending” replica state where the replica reports the gap is an M3 question.
  • Cross-tenant clean-room compliance. When two parties share a clean-room deployment, who is the data controller / processor for each party’s data? 09-access-control.md § Multi-tenant clean rooms covers the security model; the legal model is a per-deployment contract concern.