ADR-0011 — Catalog port semantics across backends
- Status: Accepted
- Date: 2026-05-12
- Related specs:
01-architecture.md,03-pipelines.md§ Crash resume,06-deployment.md, ADR-0006, ADR-0007
Context
Section titled “Context”The catalog (trait Catalog) is implemented against at least three backends across topologies:
- Local SQLite — Topology 1 (CLI) and Topology 2 (single-VM daemon).
- Durable Object SQLite — Topology 3 (Cloudflare), per-project DO (ADR-0006).
- RDS Postgres — Topology 4 (AWS) and Topology 2.5 (multi-node self-hosted cluster) (ADR-0007).
DynamoDB is a planned M2.5 adapter; the same semantics apply.
The catalog carries load-bearing state: pipeline cursors, per-chunk leases (backfill), schema-evolution history, snapshot manifests, OTel-audit lineage edges. Semantic divergence across backends produces backend-specific bugs. Concrete examples:
SELECT FOR UPDATE SKIP LOCKED(the backfill chunk-claim primitive) does not exist in SQLite. The naïve port (BEGIN IMMEDIATE) has different contention semantics.- DO SQLite enforces single-writer per DO (i.e., per project) by construction. Local SQLite under
hakiri serveallows multiple in-process writers. RDS Postgres allows N-way write concurrency with row-level locks. - Postgres supports point-in-time recovery; SQLite point-in-time is only as good as the last on-disk snapshot.
The M2 success criterion (“the same hakiri.toml produces byte-identical Parquet on both clouds after a 24h soak”) requires the catalog to behave identically across backends — not just expose the same Rust API.
Decision
Section titled “Decision”We pin the trait Catalog contract as a small set of semantic invariants every backend must provide. Each backend implements the contract using its native primitive; the contract is verified by a single conformance test suite run against all backends in CI.
The contract
Section titled “The contract”For every backend:
- Linearizability per
pipeline_id. All writes to rows keyed by a singlepipeline_idare linearizable. Two writes against the same pipeline_id appear in some total order to any reader. - Monotonic schema-history reads. Once a
schema_history(pipeline_id, version, schema_json, applied_at)row is written, no reader subsequently sees an older version of that row, and the version sequence is dense and monotonic. - At-most-once chunk dispatch. A
pipeline_chunks(chunk_id, attempt) → holder_noderow, once claimed, cannot be claimed concurrently by two workers. Two workers attempting to claim the same chunk produce exactly one success. - Append-only lineage edges.
lineage(run_id, record_id, source_run_id, ...)rows are append-only. The catalog refusesUPDATEorDELETEon this table — onlyINSERTandSELECT. - Atomic snapshot commit. A snapshot row (
snapshots(table, snapshot_ts, includes_runs, indexes, ...)) becomes visible to readers only after all referenced sidecar manifests are durable. The contract requires acommit_snapshot()API that performs this atomically. - Capability-token revocation epoch reads.
revocation_epochs(project, tenant, principal_class) → epochis read on every token verification. Reads must reflect writes within a bounded staleness window (default 60s; configurable). Stale reads are safe in that they allow extra access for the staleness window; the catalog never returns fresher revocation than truth, and never returns rows that were never written.
Per-backend primitive mapping
Section titled “Per-backend primitive mapping”| Invariant | Local SQLite | DO SQLite | RDS Postgres | DynamoDB (M2.5) |
|---|---|---|---|---|
| (1) Linearizability per pipeline_id | BEGIN IMMEDIATE + per-pipeline row lock | DO actor scope (single writer per project DO) | SELECT FOR UPDATE on the pipeline row | Conditional update with IF version = X |
| (2) Monotonic schema-history | INSERT with WAL fsync; version is PRIMARY KEY AUTOINCREMENT | DO transactional write | Postgres INSERT with version UNIQUE constraint | Single-item conditional INSERT keyed by (pipeline_id, version) |
| (3) At-most-once chunk dispatch | BEGIN IMMEDIATE + UPDATE WHERE status='pending' + retry | DO actor scope | SELECT FOR UPDATE SKIP LOCKED LIMIT 1 | Conditional update with status = 'pending' |
| (4) Append-only lineage | View-level constraint (CREATE TRIGGER) | View-level constraint | View-level constraint + revoked UPDATE/DELETE grants | Streams + immutable item attribute |
| (5) Atomic snapshot commit | Single SQLite transaction touching snapshots + each indexes row | DO transactional write across keys | Single Postgres transaction | TransactWriteItems across catalog tables |
| (6) Revocation epoch reads | Direct read (no replication, no staleness) | Direct read | Read with optional read-replica routing; staleness ≤ replica lag | Eventually-consistent read with bounded staleness |
Conformance test suite
Section titled “Conformance test suite”A single test suite under crates/hakiri-context/tests/catalog_conformance/ exercises:
- Concurrent chunk-claim under load (100 workers competing for 1000 chunks, expect exactly-once dispatch).
- Schema-history insert + read under contention (no torn reads, no version skips).
- Lease acquisition with crash + recovery (simulated holder death, verify takeover after TTL).
- Snapshot commit + read (commit not visible until all sidecars referenced; visible immediately after).
- Revocation epoch propagation (bump epoch, verify rejection within staleness window across N readers).
The suite runs against every backend in CI. A backend whose conformance test fails cannot be released. The local-SQLite backend is the reference implementation; backend-specific divergence (e.g., DynamoDB’s lack of strict linearizability across items in a transaction) requires either (a) the backend implementing a compensating pattern or (b) an explicit divergence in the contract scoped to that backend.
Consequences
Section titled “Consequences”Positive
- One contract, many backends — operators can swap topologies without code changes (only config).
- The M2 byte-identical-Parquet soak test has a fighting chance because the catalog cannot silently produce different results on different backends.
- New backends (DynamoDB, eventually FoundationDB or others) ship by passing the conformance suite; no surprise behavior.
Negative
- DynamoDB’s consistency model requires more catalog code than the Postgres adapter (single-item conditional writes plus
TransactWriteItems). The conformance suite is the discipline that exposes the gaps. - The conformance suite itself is non-trivial — easily 2k lines of Rust integration tests. Worth it.
- Some Postgres-native conveniences (e.g.,
LISTEN/NOTIFYfor lease expiration push) are not in the contract because SQLite/DO can’t provide them. Backends may use them internally as performance optimizations but not as semantic primitives.
Neutral
- The contract is intentionally small. Features that don’t fit (full-text search, complex joins) live in DuckDB over Parquet, not in the catalog.
Alternatives considered
Section titled “Alternatives considered”One backend (Postgres everywhere). Cleanest semantically, but requires running Postgres for the local CLI and for the Cloudflare topology — directly conflicts with Pillar 1 and the no-orchestrator promise.
Loose contract (“the implementations are similar enough”). What we started with. Rejected because the M2 soak test will surface a backend-specific bug a week before launch otherwise. The conformance suite is the only honest path.
Embed a single Rust embedded DB (redb, sled) and ignore Postgres. Doesn’t address DO SQLite or the AWS multi-node case where shared catalog state across Fargate tasks is required.
References
Section titled “References”01-architecture.md§ trait Catalog03-pipelines.md§ Crash resume- ADR-0006, ADR-0007
- Jepsen analyses — the testing-shape reference for conformance suites of this kind