Skip to content

ADR-0012 — Binary feature profiles and CI footprint gates

Challenge 5’s footprint commitments — ≤ 50 MB compressed binary, ≤ 60 MB idle RSS, ≤ 200 ms Lambda cold-start, ≤ 30 MB Lambda zip — were originally specified as one set of numbers applied to “the Hakiri binary.” Cumulative dependency reality says this is not achievable as a single artifact:

Realistic linked footprint of the full feature set:

DependencyCompressed size (musl, release)
Polars (Rust core)8–12 MB
DuckDB (static)8–15 MB (more with extensions)
Wasmtime (with Cranelift)15–25 MB
Tantivy3–5 MB
HNSW (instant-distance or similar)1–3 MB
arrow-rs (full kernels)5–10 MB
openraft2–4 MB
sqlx + reqwest + tokio + TLS stack5–8 MB
Misc (serde, schemars, biscuit, etc.)5–10 MB
Total (lower bound)~52–92 MB

That’s already at-or-over the budget before any source code is written. Lambda’s 30 MB zip cap is unreachable for the full feature set under any realistic compression. Pretending one binary hits all the budgets simultaneously forces ugly trade-offs late (rip out Wasmtime to fit Lambda, then re-add it via a separate build, then lose track of which build does what).

Hakiri ships in three feature-flagged build profiles, compiled from the same source tree via Cargo features:

ProfileUseCompressed budgetIdle RSS budgetCargo features
hakiri-coreLambda zips, edge runtimes (Cloudflare Containers WASI), CI runners, the smallest standalone install≤ 50 MB≤ 60 MBduckdb, polars, sqlite-catalog, s3-sync, built-in connectors
hakiri-fullDaemon serving the agent retrieval surface (MCP, indexes, WASM connector host)≤ 150 MB≤ 180 MBcore + wasmtime, tantivy, hnsw, mcp-server, postgres-catalog, dynamo-catalog
hakiri-coordRaft coordinator nodes for the Topology 2.5 self-hosted cluster≤ 50 MB≤ 50 MBraft, coord-api — explicitly not duckdb, wasmtime, query engine, MCP server

Footprint budgets are enforced as CI gates on every PR, not aspirations:

# .github/workflows/footprint.yml (sketch)
- run: cargo build --profile=release --features=hakiri-core --target=x86_64-unknown-linux-musl
- run: |
SIZE=$(stat -c%s target/x86_64-unknown-linux-musl/release/hakiri-core.gz)
[[ $SIZE -le 52428800 ]] || { echo "hakiri-core compressed $SIZE > 50 MB"; exit 1; }
- run: cargo build --profile=release --features=hakiri-full
- run: |
SIZE=$(stat -c%s target/.../hakiri-full.gz)
[[ $SIZE -le 157286400 ]] || { echo "hakiri-full compressed $SIZE > 150 MB"; exit 1; }
- run: cargo build --profile=release --features=hakiri-coord
- run: |
SIZE=$(stat -c%s target/.../hakiri-coord.gz)
[[ $SIZE -le 52428800 ]] || { echo "hakiri-coord compressed $SIZE > 50 MB"; exit 1; }
- run: ./scripts/measure-cold-start.sh hakiri-core # asserts ≤ 200 ms
- run: ./scripts/measure-idle-rss.sh hakiri-full # asserts ≤ 180 MB
- run: ldd target/.../hakiri-* | tee /dev/stderr | grep -v libc.musl- && exit 1 || true # no surprise deps

A PR that bloats any profile past its budget fails CI like any other regression. A 7-day soak test (run on main weekly) asserts RSS-at-day-7 ≈ RSS-at-day-1 ± 5%.

Before any M0 code is written, build hakiri-full with all feature deps in Cargo.toml (no source, fn main() { /* touch each library */ }) and measure compressed size, idle RSS, Lambda cold-start. If any number exceeds its budget, the budgets get rewritten before M0, not after. This is the load-bearing measurement that the rest of the deploy story rests on.

Each profile ships as its own artifact:

  • Homebrew tap: brew install hakiri/tap/hakiri installs hakiri-full; brew install hakiri/tap/hakiri-core for the smaller profile; brew install hakiri/tap/hakiri-coord for cluster coordinator nodes.
  • Docker: ghcr.io/<org>/hakiri-core, ghcr.io/<org>/hakiri-full, ghcr.io/<org>/hakiri-coord — each tagged independently.
  • GitHub Releases: three asset trees per release, with SHA-256 checksums and SBOMs.
  • Install script (curl https://hakiri.dev/install.sh | sh) detects the platform and defaults to hakiri-full; --profile=core or --profile=coord to override.

Positive

  • Honest budgets. Each profile has a number the architecture can actually meet.
  • CI catches footprint regressions before they ship. A PR that adds a heavy dependency surfaces the cost on the same PR, not at release.
  • The three profiles map cleanly onto the three deployment shapes: edge (core), daemon (full), cluster control (coord).
  • Operators can pick the profile that fits their environment — a regulated VPC running the daemon doesn’t carry coordinator code; a Lambda function doesn’t carry HNSW.

Negative

  • Three build artifacts, three release pipelines, three Docker images. More release infrastructure than one binary.
  • The “one binary, many roles” pitch becomes “one source tree, three feature-flagged binaries that share configuration.” Slightly less crisp but more honest.
  • Operators who want all three roles on one VM run all three binaries — slightly heavier than a single combined binary would be (small overhead since the heavy deps don’t overlap).

Neutral

  • Profile boundaries (which features go in which profile) are decisions that may shift as features land. The Cargo feature flags make this cheap to revisit; the budgets stay fixed.

Single binary, one budget. What the PRD originally claimed. Rejected because the math doesn’t work — cumulative dependencies exceed the budget before code is written.

Single binary, larger budget (e.g., 150 MB everywhere). Honest about full feature size, but kills Lambda deploy (30 MB zip cap) and inflates the cold-start budget unacceptably.

Many small binaries (one per role). hakiri-cli, hakiri-serve, hakiri-mcp, hakiri-sync, hakiri-coord, hakiri-worker, etc. Rejected because (a) shared configuration across many binaries is fiddly, (b) the role boundaries blur in practice (the daemon already does scheduling + WASM host + MCP server in one process), (c) operationally heavier without benefit. Three profiles is the right granularity.

Dynamically loaded plugins. Heavy deps like Tantivy and Wasmtime loaded as .so/.dylib at runtime. Rejected because Rust ABI is unstable, sandboxing is harder, and the operational story (which plugin is loaded?) is worse than three profiles.

Embed Lambda-specific build only when targeting Lambda. A single binary with a --lambda flag that skips heavy features at runtime. Doesn’t help — the binary still has to link the heavy deps to have the flag, so compressed size doesn’t shrink.