Roadmap

Phased milestones. Each milestone has explicit success criteria that double as acceptance tests. We don’t move to the next milestone until the previous one’s criteria pass on CI.

M0 — Walking skeleton (8 weeks)

Goal: prove the architecture end-to-end with the smallest viable scope. Establish the port boundaries M1 and M2 will need, so later milestones aren’t refactors.

Scope

hakiri-core: Schema, Record, Batch, Cursor, CursorKind (monotonic | opaque-token | snapshot-id), PipelineSpec, port traits (Source, Destination, Transform, Catalog, Clock), error types
hakiri-runtime: linear executor for one source → one destination, no scheduling
hakiri-context: SQLite as one adapter of the Catalog port + Parquet writer + DuckDB query wrapper
hakiri-connectors: two built-ins only — http (paginated REST) and context (the local store as destination)
hakiri-cli: init, run, query, plan, pipeline list/describe

Success criteria

From a fresh checkout: hakiri init demo && hakiri run httpbin && hakiri query "select count(*) from httpbin_anything" returns a row count.
Crash-mid-run leaves the catalog in a consistent state (RunStatus::PartialFailure); re-running resumes.
Acceptance test under tests/acceptance/ covers the happy path on Linux in CI; macOS runs as a smoke job.
trait Catalog adapter contract test passes — swap SQLite for an in-memory mock and the same conformance suite passes. This protects M2’s RDS / DO adapters from being refactors.
Single static binary < 100 MB stripped (Linux musl, ARM64). DuckDB is gated by --features duckdb; without it the binary is < 50 MB and hakiri query is disabled with a clear message.

Explicitly not in M0

WASM connectors (built-ins only)
Sync to S3
MCP server
Scheduling
JSON manifest support — TOML-only in M0; JSON support lands in M1 alongside the MCP server

M1 — WASM connectors + agent operating surface + sync + team mode

Goal: WASM-sandboxed connectors run alongside built-ins; agents can operate (discover, query, run) pipelines via MCP; teams can sync a context store via R2. And a team can collaboratively configure pipelines from an Electron desktop app and a web UI, with scheduled execution on Cloudflare (or self-hosted Rust hakiri-control for air-gapped). Authoring connectors via agents is not in M1 — that’s M2.

Duration is open — adding team mode roughly doubles the original 16-week estimate; updated estimate lands when the M1 spike work completes.

Scope — engine (carried from earlier M1 plan)

hakiri-runtime: wasmtime Component host, capability enforcement, per-call resource limits (fuel, memory cap, wall-clock deadline, log bytes/sec — see 02-connectors.md)
hakiri-agent: MCP server (stdio + HTTP) exposing the operating surface only — catalog.list_connectors, source.list_tables, source.sample_table, source.discover, context.tables, context.schema, context.query, pipeline.list, pipeline.describe, pipeline.dry_run, pipeline.run, run.tail, run.trace
hakiri-sync: push/pull against any S3-compatible bucket, manifest-based diff, lease-based single-writer for opaque-token / snapshot-id cursors (per 04-context-store.md)
hakiri-runtime: in-process scheduler (cron + every X)
More built-ins: postgres (snapshot only), file, s3, github
Schema evolution decisions surface as catalog events; agents read history via context.schema
JSON manifest support — TOML and JSON both deserialize to PipelineSpec; hakiri schema export emits JSON Schema

Scope — team mode (new for M1, per 13-team-surfaces.md and 14-collab-config.md)

hakiri-control build profile — Rust daemon implementing the team-mode control plane (axum + tokio-tungstenite + loro + OAuth + canonical doc storage). Per 06-deployment.md § Topology 0. See ADR-0014.
CF team-mode deploy — Worker + Durable Object + Cron + Workflows + Containers. hakiri team init --cloudflare generates wrangler.toml for the team’s CF account. Hibernating WebSocket for Loro sync.
Electron desktop app — Mac / Windows / Linux. Renderer shared with web SPA via pnpm monorepo (apps/desktop, apps/web, packages/{ui,api-client,platform,platform-browser,platform-electron}). LaunchAgent / systemd-user / Windows-service installs the bundled Rust daemon.
Web UI — static SPA bundled with hakiri-control for self-hosted; fractalbox-hosted at app.hakiri.dev lands in M1.5.
Loro CRDT collaborative editing — per 14-collab-config.md. Doc held in DO (CF) or SQLite (hakiri-control); snapshots materialized as versioned TOML in R2 on apply. See ADR-0013.
OAuth identity — biscuit token issuance via the control plane; system browser via hakiri:// scheme on desktop, standard https:// redirect on web.
Placement model — pipelines declare placement (cf:auto, cf:<region>, node:<id>, any-mac, self-hosted:<id>); UI shows friendly pickers; control plane dispatches runs accordingly.

Success criteria

Operating eval: in a fresh Claude Code session, given a running Hakiri project with three pipelines, the agent answers “how many GitHub issues did we ingest yesterday?” via context.query correctly without human prompts beyond the first.
Two laptops + one R2 bucket: hakiri sync push on A, hakiri sync pull on B, queries return identical results across both monotonic and opaque-token cursor pipelines (the latter via lease handoff).
A hand-authored WASM connector (Rust + WIT, no agent involvement) that attempts unlisted network access fails to load with a clear capability-denied error.
hakiri serve runs a 24-hour soak with a 15-minute schedule and no memory growth > 50 MB.
WIT contract conformance test suite passes for both built-in connectors (compiled native) and at least one hand-written WASM connector.
Team mode — collaborative editing eval: 3 users editing the same pipeline doc concurrently for 10 minutes (mix of disjoint and same-field edits, including transform reorders) — all clients converge to byte-identical Loro doc state, snapshots apply cleanly.
Team mode — CF execution: a pipeline with placement = "cf:auto" fires on schedule, executes a Workflow with at least one mid-step Container restart, and recovers without duplicate writes.
Team mode — self-hosted parity: the hakiri-control Rust binary passes the same acceptance suite as the CF substrate (sync protocol, snapshot semantics, OAuth callback, placement dispatch).
Team mode — Loro perf spike: 200-pipeline × 20-transform doc with 5 simulated concurrent editors for 30 min — sync message p95 < 100 ms, doc binary < 1 MB. (Gate before locking Loro in per ADR-0013.)
Team mode — Electron install: end-to-end install on macOS / Windows / Linux registers the LaunchAgent / service / systemd-user unit, daemon starts at login, control plane sign-in via OAuth completes.

Explicitly not in M1

Agent-authored connectors (OpenAPI scaffolder, connector-authoring eval, connector.scaffold/build/install tools) — moved to M2 per the design review
Cloudflare or AWS data-plane deploys (M2) — distinct from team-mode CF deploy (M1)
Singer shim (cut entirely; see “What we’ll deliberately defer”)
Encryption at rest
fractalbox-hosted multi-tenant SaaS — that’s M1.5 / M3 (commercial layer)
App Store / Microsoft Store distribution — sandboxes break LaunchAgent install
Mobile clients

M2 — Cloudflare + AWS first-class deploys, agent authoring loop, observability (9 months)

Goal: same binary, same manifest, two clouds. Agents can author new connectors against OpenAPI specs and install them with a per-call human confirmation step. Production-grade observability.

Scope

Cloudflare topology (per 06-deployment.md)

Reconciler Worker + Workflow step.do decomposition (bounded by 1 MiB step result, 1024 steps per workflow — step results are pointers, never payloads)
Container (wasmtime) hosting the runtime; 2–8s cold-start budget with keep-warm for active pipelines
Durable Object SQLite catalog (single-writer per project, ≤10 GB per project)
R2 sync via the M1 sync engine
hakiri deploy cloudflare generates wrangler.toml + Container Dockerfile

AWS topology (per 06-deployment.md)

Lambda Reconciler + Step Functions (Express for short pipelines, Standard for long backfills) + Fargate worker
RDS Postgres catalog adapter
S3 sync
hakiri deploy aws generates a Rust CDK app
EFS-mounted SQLite is not shipped — networked-filesystem SQLite is unsafe; RDS is the only documented AWS catalog backend

Agent connector authoring

connector.scaffold from OpenAPI — host emits WIT from a fixed template; the agent fills only the Rust impl (per 02-connectors.md)
connector.build returns structured cargo JSON diagnostics, not stdout
connector.test, connector.install
openapi.lint (pre-flight: assess whether a spec is worth attempting), connector.diagnose (structured failure analysis), fixture.record (live HTTP replay capture)
Tiered eval harness — (a) compiles, (b) discover() returns a non-empty schema, (c) one open() batch round-trips, (d) full contract conformance. v0 target: 60% reach (b), 30% reach (d) on a benchmark of 10 PAT-authed REST sources with OAS 3.1 specs

Observability + ops

OTel traces + metrics, OpenLineage event emission
hakiri context compact background task
Encryption at rest for the context store

Success criteria

Cloud parity test: same hakiri.toml deploys to both clouds. After a 24h soak running identical pipelines, R2 and S3 contain byte-identical Parquet (modulo file ordering).
CF deploy: a pipeline run interrupted mid-flight resumes from the last completed Workflow step on the next reconciliation tick.
AWS deploy: same durable-resume guarantee via Step Functions Express (short) and Standard (long).
DO SQLite catalog handles 1000 sequential cursor updates within a <5s wall-time window (acknowledging DO’s single-thread execution model — 1000 concurrent calls queue, they don’t parallelize).
RDS Postgres catalog adapter passes the same conformance suite as M0’s SQLite adapter.
Agent-authoring eval on 10 PAT-authed REST sources: 60% reach discover() working, 30% reach full conformance, on Claude Opus 4.7. Run the eval on every new model release.
End-to-end OTel trace from a pipeline run viewable in SigNoz / Grafana Tempo.

Explicitly not in M2

Singer/Meltano compatibility shim (cut indefinitely per the data-engineering review)
DynamoDB catalog adapter (M2.5 if user demand)
OAuth refresh flow connectors (M2.5)
Hosted control plane (M3)
Distributed multi-node execution

M3 — fractalbox-hosted SaaS + commercial layer (open-ended)

Goal: a multi-tenant fractalbox-operated instance of the M1 team-mode control plane, plus the commercial scaffolding (billing, support tiers, attestation) needed for regulated buyers. OSS, opt-in, data plane stays with the customer. The single-tenant hakiri-control Rust binary and self-deployed CF variant from M1 remain available; M3 is the hosted/managed variant.

Scope

Multi-tenant control plane — extension of the M1 hakiri-control for multi-team operation. Per-tenant DOs, shared admin surface, tenant-scoped audit.
Centralized IdP — Google Workspace / Microsoft Entra / Okta SSO at the SaaS level. Self-hosted instances continue to bring their own.
Run-history retention — beyond what’s economical for single-tenant deploys.
Commercial onboarding — pricing, billing, support tiers. See 06-deployment.md § Topology 5.
SOC 2 + compliance attestation — scoped to the hosted SaaS instance (per 11-compliance.md). The OSS data plane is unaffected — attestation is operator/buyer work.
helm install hakiri-control chart — for teams that want Kubernetes-shaped operations of the self-hosted variant.

Success criteria

A customer can helm install the self-hosted hakiri-control chart in their own Kubernetes cluster and run the same Electron / web clients against it.
The fractalbox-hosted public instance at app.hakiri.dev runs the same code as the OSS chart (no closed-source upsell).
The hosted control plane never reads customer pipeline data — only metadata, manifest snapshots, run summaries, audit events.
A regulated customer can purchase a tier with SOC 2-scoped operations and receive a BAA (where applicable) without code modifications to the underlying OSS stack.

Cross-cutting workstreams

Run in parallel across milestones:

Documentation site: docs.hakiri.dev, MDX-based, generated examples that match the test suite. Start in M0; ship a real site in M1.
Eval harness: connector-authoring benchmark, run on every model release. Start in M1.
Connector inventory: a vetted list of community-authored connectors with conformance test results. Start in M1, expand throughout.
Performance benchmarks: nightly runs against synthetic 1M / 10M / 100M row sources to track regression. Start in M2.

What we’ll deliberately defer

These come up often but are not on the roadmap until evidence demands them:

Singer/Meltano compatibility shim — would drag Singer’s stdio JSONL throughput ceiling and brittle state semantics into the new design. Cut from M2 per the data-engineering review. Either invest in real connector ports or skip; the middle ground poisons the brand.
EFS-mounted SQLite catalog — unsafe on NFS (advisory locks, WAL-not-supported, ECS deploy-window races). See 06-deployment.md. RDS Postgres is the documented AWS catalog instead.
Streaming SQL / continuous queries
Pipeline DAGs with cross-pipeline dependencies
A connector marketplace with auth/billing
Reverse ETL UI
Distributed multi-node execution
Kafka source (user-authored connector or external Kafka→Hakiri shim)
Iceberg/Delta table format (Parquet + manifest covers v0)

Decision log

Significant architecture decisions live as ADRs under specs/adr/. See adr/README.md for the index, status of each ADR, and the authoring template.