Deployment

The same source tree compiled in three feature profiles, each addressing a distinct topology. Plus a fifth deployment shape (Cloudflare Workflows) that consumes the binary as a step.

Build profiles

Hakiri ships in three feature-flagged build profiles to honor the footprint budgets in PRD Pillar 1 and Challenge 5. Trying to fit the agent-retrieval surface (HNSW + Tantivy + Wasmtime + Polars + DuckDB) into the Lambda cold-start budget is structurally impossible; three profiles is the honest answer. Rationale and CI gates: ADR-0012.

Profile	Use	Compressed budget	Idle RSS	Cargo features
`hakiri-core`	Lambda zips, edge runtimes, CI runners	≤ 50 MB	≤ 60 MB	`duckdb`, `polars`, `sqlite-catalog`, `s3-sync`, built-in connectors
`hakiri-full`	Daemon serving agent retrieval	≤ 150 MB	≤ 180 MB	`core` + `wasmtime`, `tantivy`, `hnsw`, `mcp-server`, `postgres-catalog`
`hakiri-coord`	Raft coordinator for Topology 2.5 clusters	≤ 50 MB	≤ 50 MB	`raft`, `coord-api` — no query engine, no WASM host
`hakiri-control`	Team-mode control plane (self-hosted alternative to CF Workers)	≤ 60 MB	≤ 80 MB	`core` + `axum`, `tokio-tungstenite`, `loro`, `oauth`, `canonical-doc-storage`

Each profile is built from the same source tree via Cargo features. A pre-M0 measurement spike validates the budgets before any feature implementation begins (Challenge 5 § Pre-M0 spike). The budgets are enforced as CI gates on every PR — a PR that bloats hakiri-core past 50 MB compressed fails CI in the same way a test failure does.

Build targets

Target	Profiles available	Use	Toolchain
`x86_64-unknown-linux-musl`	core, full, coord	Linux servers, Docker, Lambda	`cross` or `cargo zigbuild`
`aarch64-unknown-linux-musl`	core, full, coord	ARM servers, Lambda Graviton	same
`aarch64-apple-darwin`	core, full	Apple Silicon dev	native
`x86_64-apple-darwin`	core, full	Intel Mac dev	native
`x86_64-pc-windows-msvc`	core, full	Windows dev	native
`wasm32-wasip2`	core	Cloudflare Containers (WASM), wasmtime hosts	requires WASI 0.2 + Component Model

Distribution: GitHub Releases with checksums, Homebrew tap (brew install hakiri/tap/hakiri — installs hakiri-full by default; brew install hakiri/tap/hakiri-core for the smaller profile), an install script (curl https://hakiri.dev/install.sh | sh), and Docker images (ghcr.io/<org>/hakiri-core:<version>, ghcr.io/<org>/hakiri-full:<version>, ghcr.io/<org>/hakiri-coord:<version>).

First-class targets

Hakiri commits to two production deployment targets as first-class in v0 (with feature parity guarantees and bespoke hakiri deploy <cloud> tooling):

Cloudflare — Workers + Workflows + Containers + Durable Objects + R2
AWS — Lambda + Step Functions + Fargate + RDS/Dynamo + S3

Both are reconciliation-shaped platforms (cron-driven triggers, durable orchestration, object-storage destinations, single-writer state primitives). The declarative manifest from 03-pipelines.md lifts identically onto either; only the catalog backend, orchestration engine, and object-store endpoint differ. Same binary, same WASM connectors, same hakiri.toml.

Other targets (GCP Cloud Run + Workflows, Fly.io Machines, Kubernetes, bare metal) work via Topology 2 (self-hosted daemon) or Topology 2.5 (self-hosted cluster) but don’t ship bespoke hakiri deploy wiring in v0. Rationale for this scope and alternatives: see ADR-0009.

Parity matrix

Capability	Local	Daemon	Cloudflare	AWS
Built-in connectors	✓	✓	✓	✓
Agent-authored WASM connectors	✓	✓	✓ (Container)	✓ (Fargate / EC2; not Lambda)
Pipeline runs > 15 min	✓	✓	✓ (Workflow + Container)	✓ (Fargate; Lambda capped at 15 min)
Cron-driven reconciliation	manual	in-process scheduler	Cron Trigger	EventBridge Schedule
Durable mid-run resume	crash-resume from catalog	crash-resume from catalog	Workflow checkpoints	Step Functions state machine
Single-writer catalog	trivial (one process)	trivial	DO SQLite	RDS / Dynamo / EFS-SQLite
Object store sync target	local fs (no sync)	local + S3	R2 (native)	S3 (native)
Cold-start latency	n/a	0	2–8s (Container cold) / <100ms (warm) — keep warm for active pipelines	~100ms (Lambda) / 0 (Fargate)
MCP server	stdio + HTTP	HTTP	HTTP (via Worker)	HTTP (via Fargate or API Gateway)

The success criterion for cloud parity (validated in M2): the same hakiri.toml deploys to both clouds and produces byte-identical Parquet in their respective object stores after a 24h soak.

Topology 0 — Team mode (M1, default for team product)

The day-1 team product runs the control plane on Cloudflare and per-machine daemons as LaunchAgents / systemd-user units / Windows services (per Topology 2), with optional CF Workers + Workflows + Containers for scheduled execution that fires regardless of laptop state. Configuration is collaboratively edited from the Electron app and the web UI (13-team-surfaces.md) over a Loro CRDT channel (14-collab-config.md).

For air-gapped / on-prem teams, the hakiri-control build profile ships the same control plane as a Rust daemon — same wire protocol, same Electron / web clients, different substrate. Both options ship in M1, gated by a shared CI acceptance suite. See ADR-0014.

Shape (default — CF substrate)

flowchart TB
  subgraph Surfaces["Surfaces — thin clients"]
    Web[Web UI · SPA]
    Mac[Electron · Mac/Win/Linux]
  end

  subgraph CP["Control plane — Cloudflare"]
    W[Control Worker<br/>HTTP + WebSocket]
    DO[(Durable Object<br/>team state · Loro canonical doc<br/>hibernating WebSocket)]
    Cron[Cron Trigger]
    WF[Workflows]
    Cont[Containers · WASM connectors]
  end

  subgraph Edge["Substrate"]
    R2[(R2 · manifest@vN.toml snapshots<br/>Parquet)]
  end

  subgraph Local["Per-machine — local-first agent context"]
    LD[hakiri-full daemon<br/>LaunchAgent / systemd-user]
  end

  Web <-->|Loro sync<br/>WebSocket| W
  Mac <-->|Loro sync<br/>WebSocket| W
  W <--> DO
  Cron --> W
  W --> WF --> Cont
  WF --> R2
  R2 -.pull manifest.-> LD
  LD --> R2
  Mac <-->|localhost MCP + run-now| LD

How it works

Control plane = Worker + Durable Object. One DO per team. Holds team membership, the canonical Loro config doc, Loro op log, capability-token issuance state. A hibernating WebSocket from each Electron / web client keeps an idle-free real-time channel.
Manifest snapshots in R2. On apply, the control plane validates the Loro doc against JSON Schema and writes a versioned manifest@<v>.toml to R2. Daemons consume these and never speak CRDT.
Schedules fire on CF Cron. No always-on team worker required. A pipeline with placement = "cf:auto" (default) runs on the team’s CF Worker — Workflow handles long runs, Container handles WASM connectors, sleeps when idle.
Per-machine daemons remain. Each user’s machine runs hakiri-full via LaunchAgent / systemd-user / Windows service for local agent MCP access and DuckDB replica querying. Pipelines tagged placement = "node:alice-mbp" or placement = "any-mac" run on those daemons; CF leases prevent double-run.
OAuth via system browser. Per 13-team-surfaces.md, desktop uses the hakiri:// custom URL scheme; web uses standard https:// redirect. Both flow tokens through the control plane; biscuit (09-access-control.md) is the token format end-to-end.

Deploy

# fractalbox-hosted multi-tenant (M1.5)
hakiri team init --hosted

# Self-deployed CF in the team's own CF account (M1)
hakiri team init --cloudflare --account $CF_ACCOUNT

# Self-hosted Rust hakiri-control binary (M1, air-gapped path)
hakiri-control --bind 0.0.0.0:7780 \
  --bucket s3://team-context \
  --catalog sqlite:///data/control.db

Placement choices

Each pipeline declares a placement; the UI surfaces friendly pickers and writes typed values to the manifest:

UI picker	Manifest value
”Run on team’s Cloudflare worker (default)“	`placement = "cf:auto"`
”Pin to region eu-west”	`placement = "cf:wnam"` (etc.)
”Run on Alice’s Mac only”	`placement = "node:alice-mbp"`
”Run on any teammate’s machine that’s online”	`placement = "any-mac"`
”Run on the self-hosted Rust control plane”	`placement = "self-hosted:control-1"`

Single-writer leases (per ADR-0005) ensure that pipelines with any-mac placement do not double-run when several teammates are online.

Cost shape (default CF substrate)

For a team of 10 with 20 pipelines firing hourly:

Resource	Monthly cost
Worker requests	~$5 (Cron + API + WebSocket frames)
Durable Object storage + transactions	~$2
Workflow executions	~$3
Container active time	~$5
R2 storage + egress	~$2
Total	~$17/month

Self-hosted Rust hakiri-control on a small VM (Hetzner CPX21 or equivalent): ~$10/month flat, regardless of pipeline volume.

No always-on team worker required

The team-mode default does not require a 24/7 task. CF Cron + Workflows wakes only when work needs doing, holds state in DO between runs, and sleeps in between. Air-gapped teams needing always-on availability run the Rust hakiri-control as a small daemon (one binary, no orchestrator).

Constraints

Worker CPU / wall-time caps the reconciler logic. Heavy work moves to Workflows + Containers (already documented in Topology 3 below).
DO single-thread execution. Each team’s mutations serialize through one DO. Practical at 1–10 ops/sec/team; shard by pipeline group if a team approaches DO bandwidth limits.
WebSocket count. Each connected client holds one hibernating WebSocket. Free tier supports 1000 concurrent; paid much higher.
CF availability is load-bearing for CF-substrate teams. Mitigation: hakiri-control Rust fallback ships in the same M1 release.

Topology 1 — Local CLI (M0)

hakiri init my-project
cd my-project
hakiri run github-issues
hakiri query

Stateless process, exits per command
All state under ./.hakiri/
No network listener
This is the dev loop and the “I just want to dump some data into Parquet” loop

Topology 2 — Self-hosted daemon (M1)

hakiri serve --port 7700 --mcp-stdio false --mcp-http true

Long-running process
Owns the in-process scheduler (cron-style triggers fire)
Exposes HTTP API (/v1/pipelines, /v1/runs, …) and MCP-over-HTTP
Manages the WASM connector pool
Survives Hakiri-binary upgrades via SIGHUP reload (best-effort)
Recommended: behind Caddy/Nginx with TLS; or a Tailscale-served port for tailnet-only access

A systemd unit and a docker-compose.yml snippet ship in examples/deploy/.

Topology 3 — Cloudflare (M2, first-class)

Cloudflare is a natural home for Hakiri because every CF primitive maps onto a piece of the declarative reconciliation model: Cron Triggers are the reconciliation tick, Workflows are the durable orchestrator, Durable Objects are the single-writer catalog, R2 is both data store and sync target. The manifest from 03-pipelines.md lifts onto CF without translation.

Topology

flowchart LR
  Cron[Cron Trigger<br/>every 15m] --> Recon[Reconciler<br/>Worker]
  Recon -->|reads manifest| R2m[(R2: hakiri.toml<br/>+ pipelines/*.json)]
  Recon -->|kicks| WF[CF Workflow<br/>one per pipeline run]
  WF -->|step.do| Container[hakiri Container<br/>wasmtime + WASM connectors]
  Container --> DO[(Durable Object<br/>SQLite catalog)]
  Container --> R2d[(R2: Parquet + snapshots)]

What each piece does

Cron Trigger — the reconciliation tick. Fires the Reconciler Worker on the schedule declared in the manifest.
Reconciler Worker — reads the manifest from R2, computes which pipelines need a run (cursor-vs-schedule diff), dispatches a Workflow per pipeline. Stays well under the Worker CPU budget because it only decides and dispatches.
CF Workflow — the durable orchestrator. hakiri apply decomposes into step.do(...) calls: discover, pull-page-1, pull-page-2, write-batch, commit-cursor. Each step is checkpointed; a crash mid-run resumes from the last completed step.
Container — the heavy lifting: HTTP fetches, Parquet encoding, WASM connector execution. Bound to the Worker via service binding (env.HAKIRI_CONTAINER.fetch(...)). Wasmtime runs inside it so WASM Component connectors execute natively.
Durable Object — the catalog. One DO per project, embedded SQLite (DO’s SQLite feature), single-writer-per-key by construction. Stores cursors, run history, schema versions, evolution decisions.
R2 — the only thing other teammates and systems read from. Manifest and Parquet data live here; sync is “this bucket is the team-shared context”.

Catalog backend: Durable Object SQLite

One DO per project, embedded SQLite. The DO’s single-writer guarantee maps onto the cursor invariant. D1 is reserved for the M3 hosted control plane. Rationale and alternatives: see ADR-0006.

Workflow step decomposition

The runtime understands when it’s running under Workflows and decomposes accordingly. A generated Workflow class looks like:

export class HakiriPipeline extends WorkflowEntrypoint {
  async run(event, step) {
    const plan = await step.do("plan", () =>
      env.HAKIRI_CONTAINER.fetch(`/v1/pipelines/${event.id}/plan`).then(r => r.json()))

    for (const page of plan.pages) {
      await step.do(`pull-${page.id}`, () =>
        env.HAKIRI_CONTAINER.fetch(`/v1/pipelines/${event.id}/pull?page=${page.id}`, { method: "POST" }))
    }

    await step.do("commit", () =>
      env.HAKIRI_CONTAINER.fetch(`/v1/pipelines/${event.id}/commit`, { method: "POST" }))
  }
}

The Container is stateless across step boundaries — all state lives in the DO. Crashes between steps re-run only the missing steps.

Deploy

hakiri deploy cloudflare \
  --account $CF_ACCOUNT \
  --bucket oh-context \
  --do-namespace hakiri-catalog \
  --container-region wnam

Generates .hakiri/deploy/cloudflare/wrangler.toml + Container Dockerfile, then runs wrangler deploy. The generated files are checked in — operators can edit, fork, or replace them.

Constraints

Worker CPU/wall time (~30s wall, 50ms CPU on free tier). The Reconciler Worker only decides; all real work runs in Workflows + Container.
CF Workflows hard caps (load-bearing invariants the runtime must honor):
- 1 MiB step result size — step results are pointers (R2 key, DO row id), never payloads. The Container writes batches to R2; the step returns the R2 key.
- 1024 steps per workflow instance — for >1024-page sources, the Container batches pages (e.g. one step per 50-page chunk). The runtime’s plan API knows the cap and groups accordingly.
- ~6h practical retry window — long backfills span multiple workflow instances, chained by the Reconciler on the next tick.
Container cold start ≈ 2–8s for non-trivial images (this is not the Worker isolate cold-start). The Reconciler keep-warms Containers for active pipelines.
WASM Component Model on workerd is less mature than wasmtime. Agent-authored connectors run in the Container only.
Source proximity matters. A Container in wnam pulling from Postgres in us-east-1 adds latency. Pin the Container’s region (--container-region) near the source.

Topology 4 — AWS (M2, first-class)

The AWS equivalent of the CF topology — same conceptual model, different primitives. A manifest tested locally deploys to either cloud with one hakiri deploy <cloud> invocation, no manifest rewrites.

Topology

flowchart LR
  EB[EventBridge Schedule<br/>rate 15 minutes] --> Recon[Reconciler<br/>Lambda]
  Recon -->|reads manifest| S3m[(S3: hakiri.toml<br/>+ pipelines/*.json)]
  Recon -->|kicks| SF[Step Functions<br/>one per pipeline run]
  SF -->|invokes| Task[hakiri Fargate Task<br/>wasmtime + WASM connectors]
  Task --> Cat[(Catalog: RDS / Dynamo /<br/>EFS-mounted SQLite)]
  Task --> S3d[(S3: Parquet + snapshots)]

Mapping to Cloudflare

Cloudflare	AWS
Cron Trigger	EventBridge Schedule
Reconciler Worker	Lambda
Workflow `step.do`	Step Functions state machine
Container	Fargate task (or Lambda for short jobs)
Durable Object SQLite	EFS-mounted SQLite (single-writer) / RDS Postgres / DynamoDB
R2	S3

The conceptual shape is identical. AWS gives more knobs (VPC placement, IAM, region choice) at the cost of more setup.

Three sub-shapes

a. Pure Lambda (≤15-min pipelines)

EventBridge → Lambda (cargo-lambda build, ~15MB compressed) → S3 + DynamoDB
Cheapest and simplest. Hard 15-minute wall-time limit; OK for incremental pulls, not first-time backfills.
Built-in connectors only; WASM Components need Fargate.

b. Lambda Reconciler + Fargate Worker (recommended)

EventBridge → Lambda Reconciler → Step Functions → Fargate Task
Lambda decides and dispatches; Fargate does the work. No wall-time limit on Fargate.
Direct parity with the CF topology.

c. Long-running Fargate Daemon

ALB → Fargate task running hakiri serve continuously
No cold starts; the scheduler runs in-process.
Simplest mental model, highest fixed cost.

Catalog backends on AWS

Backend	When to use	Trade-offs
RDS Postgres (M2, default)	Any AWS deployment	RDS fixed cost; requires SQL port of catalog DDL
DynamoDB (M2.5 adapter)	Already on Dynamo, want pay-per-request	Different consistency model; some catalog rewrites

EFS-mounted SQLite is not a shipped option — NFS locking + SQLite WAL semantics make it unsafe for catalog use. Rationale and full alternatives matrix: see ADR-0007.

The catalog port is a trait Catalog defined in M0 (hakiri-core); each backend is an adapter. M2 ships local SQLite + RDS Postgres + DO SQLite; DynamoDB lands in M2.5.

Deploy

hakiri deploy aws \
  --profile production \
  --region us-east-1 \
  --shape lambda-fargate \
  --catalog rds

Generates a Rust CDK app under .hakiri/deploy/aws/, then runs cdk deploy. The CDK code is checked in — operators can edit, fork, or replace it.

Constraints

Cold starts (~hundred ms with minimal cargo-lambda builds). Fine for ≥1-min reconciliation cadence.
VPC egress costs for sources outside AWS. Plan accordingly or pin the Fargate task to the source’s network.
Step Functions Standard vs Express. Standard ($25/M state transitions) supports long workflows but at scale gets expensive — 50 steps × 4 transitions × hourly × 100 pipelines ≈ $1.4k/month. Use Express ($1/M transitions, ≤5 min workflow duration, at-most-once semantics) for short pipelines; reserve Standard for long backfills where durable-replay matters. hakiri deploy aws picks per-pipeline based on declared schedule and estimated run duration.

CDK / Terraform examples

Ship under examples/deploy/aws/:

cdk-lambda-only/ — pure Lambda (sub-shape a)
cdk-lambda-fargate/ — Lambda + Step Functions + Fargate (sub-shape b)
cdk-fargate-daemon/ — long-running Fargate (sub-shape c)
terraform-lambda-fargate/ — Terraform flavor of sub-shape b

Topology 2.5 — Self-hosted cluster (M2, no orchestrator)

The horizontal scale-out path for Topology 2. Copy the binary to N VMs and point them at a bundled coordinator — no Kubernetes, no Nomad, no service mesh required. Rationale for bundling the coordinator instead of requiring external etcd: see ADR-0008.

Shape

flowchart LR
  subgraph Coord["Coordination (Raft, 3 nodes)"]
    C1[hakiri coord]
    C2[hakiri coord]
    C3[hakiri coord]
    C1 <--> C2 <--> C3 <--> C1
  end
  subgraph Workers["Workers (N nodes, scale horizontally)"]
    W1[hakiri serve]
    W2[hakiri serve]
    WN[hakiri serve …]
  end
  Workers -->|leases, cursors| Coord
  Workers --> Catalog[(Catalog<br/>Postgres / shared SQLite)]
  Workers --> Bucket[(S3-compatible bucket<br/>R2 / S3 / MinIO)]

How it works

Sharding by source partition. A pipeline declares a shard key (shard_by = "repo" for GitHub, shard_by = "table" for Postgres CDC, default shard_by = "pipeline"). Workers claim shard leases from the coordinator; each shard is owned by exactly one worker at a time.
Replication via the catalog backend. Cursors, run history, and schema decisions live in Postgres (or shared SQLite over a NAS for small deploys). Any worker can resume any shard from the catalog.
Coordinator is bundled. hakiri coord runs the same binary in coordination mode — a small Raft KV (target: <50 MB RSS). Three coordinator nodes for HA, or one for dev. External etcd / Consul are optional via coord_backend = "etcd", not required.
No load balancer required. Workers pull work; they don’t receive inbound traffic from sources. The only inbound surface is the HTTP API + MCP, which any reverse proxy (Caddy, Nginx, Tailscale Serve) fronts.
Adding capacity = scp + systemctl start. A new worker joins by pointing at the coordinator address; the coordinator rebalances shard leases on next reconciliation tick.

Deploy

# Coordinator nodes (run once per coord VM)
hakiri coord --bind 0.0.0.0:7701 --peers c1:7701,c2:7701,c3:7701

# Worker nodes (run once per worker VM)
hakiri serve --coord c1:7701,c2:7701,c3:7701 --catalog postgres://...

Or via the shipped Ansible playbook (examples/deploy/cluster/) which provisions both roles across a host inventory.

When to use this vs. cloud-first topologies

Use Topology 2.5 when…	Use Topology 3/4 when…
You already run VMs (Hetzner, OVH, on-prem, air-gapped)	You’re already on Cloudflare or AWS and want managed primitives
You need >100 GB catalog or >1 TB/day throughput cheaply	You want zero-ops cron + workflow orchestration
You’re in a regulated/sovereign environment that can’t use Cloudflare/AWS	You want sub-100ms cold start on infrequent pipelines
You want one operational model from laptop → 50-node cluster	You want per-invocation pricing

This topology explicitly does not require Kubernetes. The M3 Helm chart wraps the same binary + config for teams that already run K8s — it’s a convenience, not the canonical deploy path.

Topology 5 — fractalbox-hosted SaaS control plane (M3)

A multi-tenant managed instance of the M1 Topology 0 — Team mode control plane, operated by fractalbox at app.hakiri.dev. Sits on the same CF Workers + Durable Object + R2 primitives, but extends with:

Multi-tenant team / membership state
Centralized OAuth IdP integration and SSO
A commercial onboarding flow (pricing, billing, support tiers)
SOC 2 + compliance attestation scope (see 11-compliance.md)
Run-history retention beyond what’s economical to bundle in a single-tenant deploy

This is opt-in and OSS (no closed-source upsell). The data plane never leaves the customer’s environment — the hosted SaaS holds metadata only: team membership, manifest snapshots, audit log, capability-token issuance state.

Customers can migrate between fractalbox-hosted, self-deployed CF, and self-hosted Rust (hakiri-control) with a one-command export.

Distribution: the fractalbox-hosted instance + a helm install hakiri-control chart for self-hosting variants that want Kubernetes-shaped operations.

Secrets

Topology	Where secrets live
Local CLI	OS keychain via the `keyring` crate; `.env` fallback for dev
Daemon	Same, or `HAKIRI_SECRETS_BACKEND=vault\|aws-secrets\|gcp-sm`
Workflows step	Cloudflare Workers Secrets, surfaced as env to the container
Lambda	AWS Secrets Manager; the binary fetches at start, caches in memory
Fargate / EC2	Secrets Manager or Parameter Store via instance role

Connectors never see raw secrets — they receive a token reference (secret://github-token) and the host injects the value across the WASM boundary at call time. This is the same pattern as Cloudflare Workers’ secrets binding.

Telemetry

All deployments export OTel traces + metrics to an OTLP endpoint (env: OTEL_EXPORTER_OTLP_ENDPOINT)
Default sink: stdout JSON (so Cloud Run / Fargate / Lambda capture it natively)
Optional: send to SigNoz, Honeycomb, Tempo, Grafana Cloud

Open questions

D1 as the catalog for Workers-native deployments. D1 is SQLite-compatible enough that a thin shim should work. Worth a prototype before committing.
Cold-start budget for Lambda. With cargo-lambda’s tier-zero binaries we should be well under 200ms. Validate with a soak test.
Multi-region sync. R2 is single-region with global read; for true multi-region writers we’d need active-active sync conflict handling beyond LWW. Defer to v2.