Skip to content

ADR-0008 — Embedded `hakiri coord` over external etcd/ZooKeeper for clustering

The Topology 2.5 self-hosted cluster (multiple hakiri serve processes on plain VMs) needs a coordination service for:

  • shard lease acquisition (one worker owns a pipeline’s shard at a time),
  • worker membership and health,
  • leader election for the reconciliation scheduler,
  • small-volume cluster-wide state (current shard assignments).

Three deployment shapes were available: bundle a coordinator into the Hakiri binary (the ClickHouse Keeper model), require an external etcd / ZooKeeper / Consul cluster (the Kafka and Pulsar model), or use a cloud-managed primitive (DO SQLite, DynamoDB, S3 conditional puts).

The Hakiri binary ships with an embedded coordinator role: hakiri coord runs the same binary in coordinator mode — a small Raft-replicated KV store, modeled on ClickHouse Keeper. One coordinator for solo deploys; three or five for HA.

External etcd / Consul are supported as optional backends behind a coord_backend = "etcd" config switch but are not required. The cloud topologies (Cloudflare, AWS) use their respective managed primitives (DO SQLite, RDS Postgres) instead of hakiri coord.

Positive

  • One artifact to deploy. A production cluster is scp + a config file + a systemd unit per node — no second piece of infrastructure with its own version skew, upgrade procedure, and operational runbook.
  • Coordinator and workers share build, test, and release cycles. No “is the coord version compatible with this worker?” matrix.
  • Sovereign / air-gapped deploys work end-to-end. There is no upstream dependency on a managed service.
  • The same binary still scales down to a laptop (single process, in-process coordination) and to one VM (single hakiri serve with embedded coord). Topology 2.5 is just “scale that out.”

Negative

  • We carry the operational responsibility for a Raft implementation. Even if we use openraft or a similar mature crate, the binary’s size and complexity grow.
  • The coordinator target footprint is <50 MB RSS for a 3-node cluster. If Raft + storage pushes past 100 MB, we reconsider — bundling stops being free.
  • Teams with an existing etcd/Consul deployment may prefer to use it instead of running our coordinator. The coord_backend = "etcd" switch exists for them, but it adds a second code path.

Neutral

  • The Cloudflare and AWS topologies don’t run hakiri coord — they use managed primitives. The coordinator is specifically for self-hosted multi-VM deploys.

External etcd / ZooKeeper / Consul required. The Kafka model. Rejected because it makes the simplest production deploy — three VMs in a colo — require a second three-VM cluster just for coordination. That violates Pillar 1 (laptop → cluster with one artifact, no orchestrator required) and is a known reason teams stay on managed Kafka instead of self-hosting.

Cloud-managed primitives only (no self-hosted cluster path). Skip Topology 2.5 entirely; tell on-prem and Hetzner-style users to “use the daemon.” Rejected because a substantial slice of the target persona (sovereign deploys, on-prem, regulated environments) cannot use the public clouds, and the daemon alone doesn’t horizontally scale.

Postgres as a coordination substrate. Postgres advisory locks plus a LISTEN/NOTIFY channel can carry leases and membership. Works, but presumes a Postgres deployment that’s not otherwise needed (the local catalog is SQLite). It also couples cluster control-plane availability to the Postgres deployment’s availability, which is a weaker story than embedded Raft.

S3 conditional puts as a poor-man’s coordinator. Works for leases. Fails for leader election and membership at any scale, and ties coordination latency to S3 latency (~50–200ms). Off-table.