ADR-0008 — Embedded `hakiri coord` over external etcd/ZooKeeper for clustering

Status: Accepted
Date: 2026-05-11
Related specs: PRD Pillar 1 (simple deployment), 06-deployment.md Topology 2.5

Context

The Topology 2.5 self-hosted cluster (multiple hakiri serve processes on plain VMs) needs a coordination service for:

shard lease acquisition (one worker owns a pipeline’s shard at a time),
worker membership and health,
leader election for the reconciliation scheduler,
small-volume cluster-wide state (current shard assignments).

Three deployment shapes were available: bundle a coordinator into the Hakiri binary (the ClickHouse Keeper model), require an external etcd / ZooKeeper / Consul cluster (the Kafka and Pulsar model), or use a cloud-managed primitive (DO SQLite, DynamoDB, S3 conditional puts).

Decision

The Hakiri binary ships with an embedded coordinator role: hakiri coord runs the same binary in coordinator mode — a small Raft-replicated KV store, modeled on ClickHouse Keeper. One coordinator for solo deploys; three or five for HA.

External etcd / Consul are supported as optional backends behind a coord_backend = "etcd" config switch but are not required. The cloud topologies (Cloudflare, AWS) use their respective managed primitives (DO SQLite, RDS Postgres) instead of hakiri coord.

Consequences

Positive

One artifact to deploy. A production cluster is scp + a config file + a systemd unit per node — no second piece of infrastructure with its own version skew, upgrade procedure, and operational runbook.
Coordinator and workers share build, test, and release cycles. No “is the coord version compatible with this worker?” matrix.
Sovereign / air-gapped deploys work end-to-end. There is no upstream dependency on a managed service.
The same binary still scales down to a laptop (single process, in-process coordination) and to one VM (single hakiri serve with embedded coord). Topology 2.5 is just “scale that out.”

Negative

We carry the operational responsibility for a Raft implementation. Even if we use openraft or a similar mature crate, the binary’s size and complexity grow.
The coordinator target footprint is <50 MB RSS for a 3-node cluster. If Raft + storage pushes past 100 MB, we reconsider — bundling stops being free.
Teams with an existing etcd/Consul deployment may prefer to use it instead of running our coordinator. The coord_backend = "etcd" switch exists for them, but it adds a second code path.

Neutral

The Cloudflare and AWS topologies don’t run hakiri coord — they use managed primitives. The coordinator is specifically for self-hosted multi-VM deploys.

Alternatives considered

External etcd / ZooKeeper / Consul required. The Kafka model. Rejected because it makes the simplest production deploy — three VMs in a colo — require a second three-VM cluster just for coordination. That violates Pillar 1 (laptop → cluster with one artifact, no orchestrator required) and is a known reason teams stay on managed Kafka instead of self-hosting.

Cloud-managed primitives only (no self-hosted cluster path). Skip Topology 2.5 entirely; tell on-prem and Hetzner-style users to “use the daemon.” Rejected because a substantial slice of the target persona (sovereign deploys, on-prem, regulated environments) cannot use the public clouds, and the daemon alone doesn’t horizontally scale.

Postgres as a coordination substrate. Postgres advisory locks plus a LISTEN/NOTIFY channel can carry leases and membership. Works, but presumes a Postgres deployment that’s not otherwise needed (the local catalog is SQLite). It also couples cluster control-plane availability to the Postgres deployment’s availability, which is a weaker story than embedded Raft.

S3 conditional puts as a poor-man’s coordinator. Works for leases. Fails for leader election and membership at any scale, and ties coordination latency to S3 latency (~50–200ms). Off-table.

References

ClickHouse Keeper — the model we’re borrowing
openraft — likely Rust Raft implementation
Specs: 06-deployment.md Topology 2.5
Product framing: PRD Pillar 1 (simple deployment)