Skip to content

Deployment

The same source tree compiled in three feature profiles, each addressing a distinct topology. Plus a fifth deployment shape (Cloudflare Workflows) that consumes the binary as a step.

Hakiri ships in three feature-flagged build profiles to honor the footprint budgets in PRD Pillar 1 and Challenge 5. Trying to fit the agent-retrieval surface (HNSW + Tantivy + Wasmtime + Polars + DuckDB) into the Lambda cold-start budget is structurally impossible; three profiles is the honest answer. Rationale and CI gates: ADR-0012.

ProfileUseCompressed budgetIdle RSSCargo features
hakiri-coreLambda zips, edge runtimes, CI runners≤ 50 MB≤ 60 MBduckdb, polars, sqlite-catalog, s3-sync, built-in connectors
hakiri-fullDaemon serving agent retrieval≤ 150 MB≤ 180 MBcore + wasmtime, tantivy, hnsw, mcp-server, postgres-catalog
hakiri-coordRaft coordinator for Topology 2.5 clusters≤ 50 MB≤ 50 MBraft, coord-api — no query engine, no WASM host
hakiri-controlTeam-mode control plane (self-hosted alternative to CF Workers)≤ 60 MB≤ 80 MBcore + axum, tokio-tungstenite, loro, oauth, canonical-doc-storage

Each profile is built from the same source tree via Cargo features. A pre-M0 measurement spike validates the budgets before any feature implementation begins (Challenge 5 § Pre-M0 spike). The budgets are enforced as CI gates on every PR — a PR that bloats hakiri-core past 50 MB compressed fails CI in the same way a test failure does.

TargetProfiles availableUseToolchain
x86_64-unknown-linux-muslcore, full, coordLinux servers, Docker, Lambdacross or cargo zigbuild
aarch64-unknown-linux-muslcore, full, coordARM servers, Lambda Gravitonsame
aarch64-apple-darwincore, fullApple Silicon devnative
x86_64-apple-darwincore, fullIntel Mac devnative
x86_64-pc-windows-msvccore, fullWindows devnative
wasm32-wasip2coreCloudflare Containers (WASM), wasmtime hostsrequires WASI 0.2 + Component Model

Distribution: GitHub Releases with checksums, Homebrew tap (brew install hakiri/tap/hakiri — installs hakiri-full by default; brew install hakiri/tap/hakiri-core for the smaller profile), an install script (curl https://hakiri.dev/install.sh | sh), and Docker images (ghcr.io/<org>/hakiri-core:<version>, ghcr.io/<org>/hakiri-full:<version>, ghcr.io/<org>/hakiri-coord:<version>).

Hakiri commits to two production deployment targets as first-class in v0 (with feature parity guarantees and bespoke hakiri deploy <cloud> tooling):

  • Cloudflare — Workers + Workflows + Containers + Durable Objects + R2
  • AWS — Lambda + Step Functions + Fargate + RDS/Dynamo + S3

Both are reconciliation-shaped platforms (cron-driven triggers, durable orchestration, object-storage destinations, single-writer state primitives). The declarative manifest from 03-pipelines.md lifts identically onto either; only the catalog backend, orchestration engine, and object-store endpoint differ. Same binary, same WASM connectors, same hakiri.toml.

Other targets (GCP Cloud Run + Workflows, Fly.io Machines, Kubernetes, bare metal) work via Topology 2 (self-hosted daemon) or Topology 2.5 (self-hosted cluster) but don’t ship bespoke hakiri deploy wiring in v0. Rationale for this scope and alternatives: see ADR-0009.

CapabilityLocalDaemonCloudflareAWS
Built-in connectors
Agent-authored WASM connectors✓ (Container)✓ (Fargate / EC2; not Lambda)
Pipeline runs > 15 min✓ (Workflow + Container)✓ (Fargate; Lambda capped at 15 min)
Cron-driven reconciliationmanualin-process schedulerCron TriggerEventBridge Schedule
Durable mid-run resumecrash-resume from catalogcrash-resume from catalogWorkflow checkpointsStep Functions state machine
Single-writer catalogtrivial (one process)trivialDO SQLiteRDS / Dynamo / EFS-SQLite
Object store sync targetlocal fs (no sync)local + S3R2 (native)S3 (native)
Cold-start latencyn/a02–8s (Container cold) / <100ms (warm) — keep warm for active pipelines~100ms (Lambda) / 0 (Fargate)
MCP serverstdio + HTTPHTTPHTTP (via Worker)HTTP (via Fargate or API Gateway)

The success criterion for cloud parity (validated in M2): the same hakiri.toml deploys to both clouds and produces byte-identical Parquet in their respective object stores after a 24h soak.

Topology 0 — Team mode (M1, default for team product)

Section titled “Topology 0 — Team mode (M1, default for team product)”

The day-1 team product runs the control plane on Cloudflare and per-machine daemons as LaunchAgents / systemd-user units / Windows services (per Topology 2), with optional CF Workers + Workflows + Containers for scheduled execution that fires regardless of laptop state. Configuration is collaboratively edited from the Electron app and the web UI (13-team-surfaces.md) over a Loro CRDT channel (14-collab-config.md).

For air-gapped / on-prem teams, the hakiri-control build profile ships the same control plane as a Rust daemon — same wire protocol, same Electron / web clients, different substrate. Both options ship in M1, gated by a shared CI acceptance suite. See ADR-0014.

flowchart TB
  subgraph Surfaces["Surfaces — thin clients"]
    Web[Web UI · SPA]
    Mac[Electron · Mac/Win/Linux]
  end

  subgraph CP["Control plane — Cloudflare"]
    W[Control Worker<br/>HTTP + WebSocket]
    DO[(Durable Object<br/>team state · Loro canonical doc<br/>hibernating WebSocket)]
    Cron[Cron Trigger]
    WF[Workflows]
    Cont[Containers · WASM connectors]
  end

  subgraph Edge["Substrate"]
    R2[(R2 · manifest@vN.toml snapshots<br/>Parquet)]
  end

  subgraph Local["Per-machine — local-first agent context"]
    LD[hakiri-full daemon<br/>LaunchAgent / systemd-user]
  end

  Web <-->|Loro sync<br/>WebSocket| W
  Mac <-->|Loro sync<br/>WebSocket| W
  W <--> DO
  Cron --> W
  W --> WF --> Cont
  WF --> R2
  R2 -.pull manifest.-> LD
  LD --> R2
  Mac <-->|localhost MCP + run-now| LD
  • Control plane = Worker + Durable Object. One DO per team. Holds team membership, the canonical Loro config doc, Loro op log, capability-token issuance state. A hibernating WebSocket from each Electron / web client keeps an idle-free real-time channel.
  • Manifest snapshots in R2. On apply, the control plane validates the Loro doc against JSON Schema and writes a versioned manifest@<v>.toml to R2. Daemons consume these and never speak CRDT.
  • Schedules fire on CF Cron. No always-on team worker required. A pipeline with placement = "cf:auto" (default) runs on the team’s CF Worker — Workflow handles long runs, Container handles WASM connectors, sleeps when idle.
  • Per-machine daemons remain. Each user’s machine runs hakiri-full via LaunchAgent / systemd-user / Windows service for local agent MCP access and DuckDB replica querying. Pipelines tagged placement = "node:alice-mbp" or placement = "any-mac" run on those daemons; CF leases prevent double-run.
  • OAuth via system browser. Per 13-team-surfaces.md, desktop uses the hakiri:// custom URL scheme; web uses standard https:// redirect. Both flow tokens through the control plane; biscuit (09-access-control.md) is the token format end-to-end.
Terminal window
# fractalbox-hosted multi-tenant (M1.5)
hakiri team init --hosted
# Self-deployed CF in the team's own CF account (M1)
hakiri team init --cloudflare --account $CF_ACCOUNT
# Self-hosted Rust hakiri-control binary (M1, air-gapped path)
hakiri-control --bind 0.0.0.0:7780 \
--bucket s3://team-context \
--catalog sqlite:///data/control.db

Each pipeline declares a placement; the UI surfaces friendly pickers and writes typed values to the manifest:

UI pickerManifest value
”Run on team’s Cloudflare worker (default)“placement = "cf:auto"
”Pin to region eu-west”placement = "cf:wnam" (etc.)
”Run on Alice’s Mac only”placement = "node:alice-mbp"
”Run on any teammate’s machine that’s online”placement = "any-mac"
”Run on the self-hosted Rust control plane”placement = "self-hosted:control-1"

Single-writer leases (per ADR-0005) ensure that pipelines with any-mac placement do not double-run when several teammates are online.

For a team of 10 with 20 pipelines firing hourly:

ResourceMonthly cost
Worker requests~$5 (Cron + API + WebSocket frames)
Durable Object storage + transactions~$2
Workflow executions~$3
Container active time~$5
R2 storage + egress~$2
Total~$17/month

Self-hosted Rust hakiri-control on a small VM (Hetzner CPX21 or equivalent): ~$10/month flat, regardless of pipeline volume.

The team-mode default does not require a 24/7 task. CF Cron + Workflows wakes only when work needs doing, holds state in DO between runs, and sleeps in between. Air-gapped teams needing always-on availability run the Rust hakiri-control as a small daemon (one binary, no orchestrator).

  • Worker CPU / wall-time caps the reconciler logic. Heavy work moves to Workflows + Containers (already documented in Topology 3 below).
  • DO single-thread execution. Each team’s mutations serialize through one DO. Practical at 1–10 ops/sec/team; shard by pipeline group if a team approaches DO bandwidth limits.
  • WebSocket count. Each connected client holds one hibernating WebSocket. Free tier supports 1000 concurrent; paid much higher.
  • CF availability is load-bearing for CF-substrate teams. Mitigation: hakiri-control Rust fallback ships in the same M1 release.
Terminal window
hakiri init my-project
cd my-project
hakiri run github-issues
hakiri query
  • Stateless process, exits per command
  • All state under ./.hakiri/
  • No network listener
  • This is the dev loop and the “I just want to dump some data into Parquet” loop
Terminal window
hakiri serve --port 7700 --mcp-stdio false --mcp-http true
  • Long-running process
  • Owns the in-process scheduler (cron-style triggers fire)
  • Exposes HTTP API (/v1/pipelines, /v1/runs, …) and MCP-over-HTTP
  • Manages the WASM connector pool
  • Survives Hakiri-binary upgrades via SIGHUP reload (best-effort)
  • Recommended: behind Caddy/Nginx with TLS; or a Tailscale-served port for tailnet-only access

A systemd unit and a docker-compose.yml snippet ship in examples/deploy/.

Topology 3 — Cloudflare (M2, first-class)

Section titled “Topology 3 — Cloudflare (M2, first-class)”

Cloudflare is a natural home for Hakiri because every CF primitive maps onto a piece of the declarative reconciliation model: Cron Triggers are the reconciliation tick, Workflows are the durable orchestrator, Durable Objects are the single-writer catalog, R2 is both data store and sync target. The manifest from 03-pipelines.md lifts onto CF without translation.

flowchart LR
  Cron[Cron Trigger<br/>every 15m] --> Recon[Reconciler<br/>Worker]
  Recon -->|reads manifest| R2m[(R2: hakiri.toml<br/>+ pipelines/*.json)]
  Recon -->|kicks| WF[CF Workflow<br/>one per pipeline run]
  WF -->|step.do| Container[hakiri Container<br/>wasmtime + WASM connectors]
  Container --> DO[(Durable Object<br/>SQLite catalog)]
  Container --> R2d[(R2: Parquet + snapshots)]
  • Cron Trigger — the reconciliation tick. Fires the Reconciler Worker on the schedule declared in the manifest.
  • Reconciler Worker — reads the manifest from R2, computes which pipelines need a run (cursor-vs-schedule diff), dispatches a Workflow per pipeline. Stays well under the Worker CPU budget because it only decides and dispatches.
  • CF Workflow — the durable orchestrator. hakiri apply decomposes into step.do(...) calls: discover, pull-page-1, pull-page-2, write-batch, commit-cursor. Each step is checkpointed; a crash mid-run resumes from the last completed step.
  • Container — the heavy lifting: HTTP fetches, Parquet encoding, WASM connector execution. Bound to the Worker via service binding (env.HAKIRI_CONTAINER.fetch(...)). Wasmtime runs inside it so WASM Component connectors execute natively.
  • Durable Object — the catalog. One DO per project, embedded SQLite (DO’s SQLite feature), single-writer-per-key by construction. Stores cursors, run history, schema versions, evolution decisions.
  • R2 — the only thing other teammates and systems read from. Manifest and Parquet data live here; sync is “this bucket is the team-shared context”.

One DO per project, embedded SQLite. The DO’s single-writer guarantee maps onto the cursor invariant. D1 is reserved for the M3 hosted control plane. Rationale and alternatives: see ADR-0006.

The runtime understands when it’s running under Workflows and decomposes accordingly. A generated Workflow class looks like:

export class HakiriPipeline extends WorkflowEntrypoint {
async run(event, step) {
const plan = await step.do("plan", () =>
env.HAKIRI_CONTAINER.fetch(`/v1/pipelines/${event.id}/plan`).then(r => r.json()))
for (const page of plan.pages) {
await step.do(`pull-${page.id}`, () =>
env.HAKIRI_CONTAINER.fetch(`/v1/pipelines/${event.id}/pull?page=${page.id}`, { method: "POST" }))
}
await step.do("commit", () =>
env.HAKIRI_CONTAINER.fetch(`/v1/pipelines/${event.id}/commit`, { method: "POST" }))
}
}

The Container is stateless across step boundaries — all state lives in the DO. Crashes between steps re-run only the missing steps.

Terminal window
hakiri deploy cloudflare \
--account $CF_ACCOUNT \
--bucket oh-context \
--do-namespace hakiri-catalog \
--container-region wnam

Generates .hakiri/deploy/cloudflare/wrangler.toml + Container Dockerfile, then runs wrangler deploy. The generated files are checked in — operators can edit, fork, or replace them.

  • Worker CPU/wall time (~30s wall, 50ms CPU on free tier). The Reconciler Worker only decides; all real work runs in Workflows + Container.
  • CF Workflows hard caps (load-bearing invariants the runtime must honor):
    • 1 MiB step result size — step results are pointers (R2 key, DO row id), never payloads. The Container writes batches to R2; the step returns the R2 key.
    • 1024 steps per workflow instance — for >1024-page sources, the Container batches pages (e.g. one step per 50-page chunk). The runtime’s plan API knows the cap and groups accordingly.
    • ~6h practical retry window — long backfills span multiple workflow instances, chained by the Reconciler on the next tick.
  • Container cold start ≈ 2–8s for non-trivial images (this is not the Worker isolate cold-start). The Reconciler keep-warms Containers for active pipelines.
  • WASM Component Model on workerd is less mature than wasmtime. Agent-authored connectors run in the Container only.
  • Source proximity matters. A Container in wnam pulling from Postgres in us-east-1 adds latency. Pin the Container’s region (--container-region) near the source.

The AWS equivalent of the CF topology — same conceptual model, different primitives. A manifest tested locally deploys to either cloud with one hakiri deploy <cloud> invocation, no manifest rewrites.

flowchart LR
  EB[EventBridge Schedule<br/>rate 15 minutes] --> Recon[Reconciler<br/>Lambda]
  Recon -->|reads manifest| S3m[(S3: hakiri.toml<br/>+ pipelines/*.json)]
  Recon -->|kicks| SF[Step Functions<br/>one per pipeline run]
  SF -->|invokes| Task[hakiri Fargate Task<br/>wasmtime + WASM connectors]
  Task --> Cat[(Catalog: RDS / Dynamo /<br/>EFS-mounted SQLite)]
  Task --> S3d[(S3: Parquet + snapshots)]
CloudflareAWS
Cron TriggerEventBridge Schedule
Reconciler WorkerLambda
Workflow step.doStep Functions state machine
ContainerFargate task (or Lambda for short jobs)
Durable Object SQLiteEFS-mounted SQLite (single-writer) / RDS Postgres / DynamoDB
R2S3

The conceptual shape is identical. AWS gives more knobs (VPC placement, IAM, region choice) at the cost of more setup.

  • EventBridge → Lambda (cargo-lambda build, ~15MB compressed) → S3 + DynamoDB
  • Cheapest and simplest. Hard 15-minute wall-time limit; OK for incremental pulls, not first-time backfills.
  • Built-in connectors only; WASM Components need Fargate.
Section titled “b. Lambda Reconciler + Fargate Worker (recommended)”
  • EventBridge → Lambda Reconciler → Step Functions → Fargate Task
  • Lambda decides and dispatches; Fargate does the work. No wall-time limit on Fargate.
  • Direct parity with the CF topology.
  • ALB → Fargate task running hakiri serve continuously
  • No cold starts; the scheduler runs in-process.
  • Simplest mental model, highest fixed cost.
BackendWhen to useTrade-offs
RDS Postgres (M2, default)Any AWS deploymentRDS fixed cost; requires SQL port of catalog DDL
DynamoDB (M2.5 adapter)Already on Dynamo, want pay-per-requestDifferent consistency model; some catalog rewrites

EFS-mounted SQLite is not a shipped option — NFS locking + SQLite WAL semantics make it unsafe for catalog use. Rationale and full alternatives matrix: see ADR-0007.

The catalog port is a trait Catalog defined in M0 (hakiri-core); each backend is an adapter. M2 ships local SQLite + RDS Postgres + DO SQLite; DynamoDB lands in M2.5.

Terminal window
hakiri deploy aws \
--profile production \
--region us-east-1 \
--shape lambda-fargate \
--catalog rds

Generates a Rust CDK app under .hakiri/deploy/aws/, then runs cdk deploy. The CDK code is checked in — operators can edit, fork, or replace it.

  • Cold starts (~hundred ms with minimal cargo-lambda builds). Fine for ≥1-min reconciliation cadence.
  • VPC egress costs for sources outside AWS. Plan accordingly or pin the Fargate task to the source’s network.
  • Step Functions Standard vs Express. Standard ($25/M state transitions) supports long workflows but at scale gets expensive — 50 steps × 4 transitions × hourly × 100 pipelines ≈ $1.4k/month. Use Express ($1/M transitions, ≤5 min workflow duration, at-most-once semantics) for short pipelines; reserve Standard for long backfills where durable-replay matters. hakiri deploy aws picks per-pipeline based on declared schedule and estimated run duration.

Ship under examples/deploy/aws/:

  • cdk-lambda-only/ — pure Lambda (sub-shape a)
  • cdk-lambda-fargate/ — Lambda + Step Functions + Fargate (sub-shape b)
  • cdk-fargate-daemon/ — long-running Fargate (sub-shape c)
  • terraform-lambda-fargate/ — Terraform flavor of sub-shape b

Topology 2.5 — Self-hosted cluster (M2, no orchestrator)

Section titled “Topology 2.5 — Self-hosted cluster (M2, no orchestrator)”

The horizontal scale-out path for Topology 2. Copy the binary to N VMs and point them at a bundled coordinator — no Kubernetes, no Nomad, no service mesh required. Rationale for bundling the coordinator instead of requiring external etcd: see ADR-0008.

flowchart LR
  subgraph Coord["Coordination (Raft, 3 nodes)"]
    C1[hakiri coord]
    C2[hakiri coord]
    C3[hakiri coord]
    C1 <--> C2 <--> C3 <--> C1
  end
  subgraph Workers["Workers (N nodes, scale horizontally)"]
    W1[hakiri serve]
    W2[hakiri serve]
    WN[hakiri serve …]
  end
  Workers -->|leases, cursors| Coord
  Workers --> Catalog[(Catalog<br/>Postgres / shared SQLite)]
  Workers --> Bucket[(S3-compatible bucket<br/>R2 / S3 / MinIO)]
  • Sharding by source partition. A pipeline declares a shard key (shard_by = "repo" for GitHub, shard_by = "table" for Postgres CDC, default shard_by = "pipeline"). Workers claim shard leases from the coordinator; each shard is owned by exactly one worker at a time.
  • Replication via the catalog backend. Cursors, run history, and schema decisions live in Postgres (or shared SQLite over a NAS for small deploys). Any worker can resume any shard from the catalog.
  • Coordinator is bundled. hakiri coord runs the same binary in coordination mode — a small Raft KV (target: <50 MB RSS). Three coordinator nodes for HA, or one for dev. External etcd / Consul are optional via coord_backend = "etcd", not required.
  • No load balancer required. Workers pull work; they don’t receive inbound traffic from sources. The only inbound surface is the HTTP API + MCP, which any reverse proxy (Caddy, Nginx, Tailscale Serve) fronts.
  • Adding capacity = scp + systemctl start. A new worker joins by pointing at the coordinator address; the coordinator rebalances shard leases on next reconciliation tick.
Terminal window
# Coordinator nodes (run once per coord VM)
hakiri coord --bind 0.0.0.0:7701 --peers c1:7701,c2:7701,c3:7701
# Worker nodes (run once per worker VM)
hakiri serve --coord c1:7701,c2:7701,c3:7701 --catalog postgres://...

Or via the shipped Ansible playbook (examples/deploy/cluster/) which provisions both roles across a host inventory.

When to use this vs. cloud-first topologies

Section titled “When to use this vs. cloud-first topologies”
Use Topology 2.5 when…Use Topology 3/4 when…
You already run VMs (Hetzner, OVH, on-prem, air-gapped)You’re already on Cloudflare or AWS and want managed primitives
You need >100 GB catalog or >1 TB/day throughput cheaplyYou want zero-ops cron + workflow orchestration
You’re in a regulated/sovereign environment that can’t use Cloudflare/AWSYou want sub-100ms cold start on infrequent pipelines
You want one operational model from laptop → 50-node clusterYou want per-invocation pricing

This topology explicitly does not require Kubernetes. The M3 Helm chart wraps the same binary + config for teams that already run K8s — it’s a convenience, not the canonical deploy path.

Topology 5 — fractalbox-hosted SaaS control plane (M3)

Section titled “Topology 5 — fractalbox-hosted SaaS control plane (M3)”

A multi-tenant managed instance of the M1 Topology 0 — Team mode control plane, operated by fractalbox at app.hakiri.dev. Sits on the same CF Workers + Durable Object + R2 primitives, but extends with:

  • Multi-tenant team / membership state
  • Centralized OAuth IdP integration and SSO
  • A commercial onboarding flow (pricing, billing, support tiers)
  • SOC 2 + compliance attestation scope (see 11-compliance.md)
  • Run-history retention beyond what’s economical to bundle in a single-tenant deploy

This is opt-in and OSS (no closed-source upsell). The data plane never leaves the customer’s environment — the hosted SaaS holds metadata only: team membership, manifest snapshots, audit log, capability-token issuance state.

Customers can migrate between fractalbox-hosted, self-deployed CF, and self-hosted Rust (hakiri-control) with a one-command export.

Distribution: the fractalbox-hosted instance + a helm install hakiri-control chart for self-hosting variants that want Kubernetes-shaped operations.

TopologyWhere secrets live
Local CLIOS keychain via the keyring crate; .env fallback for dev
DaemonSame, or HAKIRI_SECRETS_BACKEND=vault|aws-secrets|gcp-sm
Workflows stepCloudflare Workers Secrets, surfaced as env to the container
LambdaAWS Secrets Manager; the binary fetches at start, caches in memory
Fargate / EC2Secrets Manager or Parameter Store via instance role

Connectors never see raw secrets — they receive a token reference (secret://github-token) and the host injects the value across the WASM boundary at call time. This is the same pattern as Cloudflare Workers’ secrets binding.

  • All deployments export OTel traces + metrics to an OTLP endpoint (env: OTEL_EXPORTER_OTLP_ENDPOINT)
  • Default sink: stdout JSON (so Cloud Run / Fargate / Lambda capture it natively)
  • Optional: send to SigNoz, Honeycomb, Tempo, Grafana Cloud
  • D1 as the catalog for Workers-native deployments. D1 is SQLite-compatible enough that a thin shim should work. Worth a prototype before committing.
  • Cold-start budget for Lambda. With cargo-lambda’s tier-zero binaries we should be well under 200ms. Validate with a soak test.
  • Multi-region sync. R2 is single-region with global read; for true multi-region writers we’d need active-active sync conflict handling beyond LWW. Defer to v2.