Skip to content

Roadmap

Phased milestones. Each milestone has explicit success criteria that double as acceptance tests. We don’t move to the next milestone until the previous one’s criteria pass on CI.

Goal: prove the architecture end-to-end with the smallest viable scope. Establish the port boundaries M1 and M2 will need, so later milestones aren’t refactors.

  • hakiri-core: Schema, Record, Batch, Cursor, CursorKind (monotonic | opaque-token | snapshot-id), PipelineSpec, port traits (Source, Destination, Transform, Catalog, Clock), error types
  • hakiri-runtime: linear executor for one source → one destination, no scheduling
  • hakiri-context: SQLite as one adapter of the Catalog port + Parquet writer + DuckDB query wrapper
  • hakiri-connectors: two built-ins only — http (paginated REST) and context (the local store as destination)
  • hakiri-cli: init, run, query, plan, pipeline list/describe
  1. From a fresh checkout: hakiri init demo && hakiri run httpbin && hakiri query "select count(*) from httpbin_anything" returns a row count.
  2. Crash-mid-run leaves the catalog in a consistent state (RunStatus::PartialFailure); re-running resumes.
  3. Acceptance test under tests/acceptance/ covers the happy path on Linux in CI; macOS runs as a smoke job.
  4. trait Catalog adapter contract test passes — swap SQLite for an in-memory mock and the same conformance suite passes. This protects M2’s RDS / DO adapters from being refactors.
  5. Single static binary < 100 MB stripped (Linux musl, ARM64). DuckDB is gated by --features duckdb; without it the binary is < 50 MB and hakiri query is disabled with a clear message.
  • WASM connectors (built-ins only)
  • Sync to S3
  • MCP server
  • Scheduling
  • JSON manifest support — TOML-only in M0; JSON support lands in M1 alongside the MCP server

M1 — WASM connectors + agent operating surface + sync + team mode

Section titled “M1 — WASM connectors + agent operating surface + sync + team mode”

Goal: WASM-sandboxed connectors run alongside built-ins; agents can operate (discover, query, run) pipelines via MCP; teams can sync a context store via R2. And a team can collaboratively configure pipelines from an Electron desktop app and a web UI, with scheduled execution on Cloudflare (or self-hosted Rust hakiri-control for air-gapped). Authoring connectors via agents is not in M1 — that’s M2.

Duration is open — adding team mode roughly doubles the original 16-week estimate; updated estimate lands when the M1 spike work completes.

Scope — engine (carried from earlier M1 plan)

Section titled “Scope — engine (carried from earlier M1 plan)”
  • hakiri-runtime: wasmtime Component host, capability enforcement, per-call resource limits (fuel, memory cap, wall-clock deadline, log bytes/sec — see 02-connectors.md)
  • hakiri-agent: MCP server (stdio + HTTP) exposing the operating surface only — catalog.list_connectors, source.list_tables, source.sample_table, source.discover, context.tables, context.schema, context.query, pipeline.list, pipeline.describe, pipeline.dry_run, pipeline.run, run.tail, run.trace
  • hakiri-sync: push/pull against any S3-compatible bucket, manifest-based diff, lease-based single-writer for opaque-token / snapshot-id cursors (per 04-context-store.md)
  • hakiri-runtime: in-process scheduler (cron + every X)
  • More built-ins: postgres (snapshot only), file, s3, github
  • Schema evolution decisions surface as catalog events; agents read history via context.schema
  • JSON manifest support — TOML and JSON both deserialize to PipelineSpec; hakiri schema export emits JSON Schema
  • hakiri-control build profile — Rust daemon implementing the team-mode control plane (axum + tokio-tungstenite + loro + OAuth + canonical doc storage). Per 06-deployment.md § Topology 0. See ADR-0014.
  • CF team-mode deploy — Worker + Durable Object + Cron + Workflows + Containers. hakiri team init --cloudflare generates wrangler.toml for the team’s CF account. Hibernating WebSocket for Loro sync.
  • Electron desktop app — Mac / Windows / Linux. Renderer shared with web SPA via pnpm monorepo (apps/desktop, apps/web, packages/{ui,api-client,platform,platform-browser,platform-electron}). LaunchAgent / systemd-user / Windows-service installs the bundled Rust daemon.
  • Web UI — static SPA bundled with hakiri-control for self-hosted; fractalbox-hosted at app.hakiri.dev lands in M1.5.
  • Loro CRDT collaborative editing — per 14-collab-config.md. Doc held in DO (CF) or SQLite (hakiri-control); snapshots materialized as versioned TOML in R2 on apply. See ADR-0013.
  • OAuth identity — biscuit token issuance via the control plane; system browser via hakiri:// scheme on desktop, standard https:// redirect on web.
  • Placement model — pipelines declare placement (cf:auto, cf:<region>, node:<id>, any-mac, self-hosted:<id>); UI shows friendly pickers; control plane dispatches runs accordingly.
  1. Operating eval: in a fresh Claude Code session, given a running Hakiri project with three pipelines, the agent answers “how many GitHub issues did we ingest yesterday?” via context.query correctly without human prompts beyond the first.
  2. Two laptops + one R2 bucket: hakiri sync push on A, hakiri sync pull on B, queries return identical results across both monotonic and opaque-token cursor pipelines (the latter via lease handoff).
  3. A hand-authored WASM connector (Rust + WIT, no agent involvement) that attempts unlisted network access fails to load with a clear capability-denied error.
  4. hakiri serve runs a 24-hour soak with a 15-minute schedule and no memory growth > 50 MB.
  5. WIT contract conformance test suite passes for both built-in connectors (compiled native) and at least one hand-written WASM connector.
  6. Team mode — collaborative editing eval: 3 users editing the same pipeline doc concurrently for 10 minutes (mix of disjoint and same-field edits, including transform reorders) — all clients converge to byte-identical Loro doc state, snapshots apply cleanly.
  7. Team mode — CF execution: a pipeline with placement = "cf:auto" fires on schedule, executes a Workflow with at least one mid-step Container restart, and recovers without duplicate writes.
  8. Team mode — self-hosted parity: the hakiri-control Rust binary passes the same acceptance suite as the CF substrate (sync protocol, snapshot semantics, OAuth callback, placement dispatch).
  9. Team mode — Loro perf spike: 200-pipeline × 20-transform doc with 5 simulated concurrent editors for 30 min — sync message p95 < 100 ms, doc binary < 1 MB. (Gate before locking Loro in per ADR-0013.)
  10. Team mode — Electron install: end-to-end install on macOS / Windows / Linux registers the LaunchAgent / service / systemd-user unit, daemon starts at login, control plane sign-in via OAuth completes.
  • Agent-authored connectors (OpenAPI scaffolder, connector-authoring eval, connector.scaffold/build/install tools) — moved to M2 per the design review
  • Cloudflare or AWS data-plane deploys (M2) — distinct from team-mode CF deploy (M1)
  • Singer shim (cut entirely; see “What we’ll deliberately defer”)
  • Encryption at rest
  • fractalbox-hosted multi-tenant SaaS — that’s M1.5 / M3 (commercial layer)
  • App Store / Microsoft Store distribution — sandboxes break LaunchAgent install
  • Mobile clients

M2 — Cloudflare + AWS first-class deploys, agent authoring loop, observability (9 months)

Section titled “M2 — Cloudflare + AWS first-class deploys, agent authoring loop, observability (9 months)”

Goal: same binary, same manifest, two clouds. Agents can author new connectors against OpenAPI specs and install them with a per-call human confirmation step. Production-grade observability.

  • Reconciler Worker + Workflow step.do decomposition (bounded by 1 MiB step result, 1024 steps per workflow — step results are pointers, never payloads)
  • Container (wasmtime) hosting the runtime; 2–8s cold-start budget with keep-warm for active pipelines
  • Durable Object SQLite catalog (single-writer per project, ≤10 GB per project)
  • R2 sync via the M1 sync engine
  • hakiri deploy cloudflare generates wrangler.toml + Container Dockerfile
  • Lambda Reconciler + Step Functions (Express for short pipelines, Standard for long backfills) + Fargate worker
  • RDS Postgres catalog adapter
  • S3 sync
  • hakiri deploy aws generates a Rust CDK app
  • EFS-mounted SQLite is not shipped — networked-filesystem SQLite is unsafe; RDS is the only documented AWS catalog backend
  • connector.scaffold from OpenAPI — host emits WIT from a fixed template; the agent fills only the Rust impl (per 02-connectors.md)
  • connector.build returns structured cargo JSON diagnostics, not stdout
  • connector.test, connector.install
  • openapi.lint (pre-flight: assess whether a spec is worth attempting), connector.diagnose (structured failure analysis), fixture.record (live HTTP replay capture)
  • Tiered eval harness — (a) compiles, (b) discover() returns a non-empty schema, (c) one open() batch round-trips, (d) full contract conformance. v0 target: 60% reach (b), 30% reach (d) on a benchmark of 10 PAT-authed REST sources with OAS 3.1 specs
  • OTel traces + metrics, OpenLineage event emission
  • hakiri context compact background task
  • Encryption at rest for the context store
  1. Cloud parity test: same hakiri.toml deploys to both clouds. After a 24h soak running identical pipelines, R2 and S3 contain byte-identical Parquet (modulo file ordering).
  2. CF deploy: a pipeline run interrupted mid-flight resumes from the last completed Workflow step on the next reconciliation tick.
  3. AWS deploy: same durable-resume guarantee via Step Functions Express (short) and Standard (long).
  4. DO SQLite catalog handles 1000 sequential cursor updates within a <5s wall-time window (acknowledging DO’s single-thread execution model — 1000 concurrent calls queue, they don’t parallelize).
  5. RDS Postgres catalog adapter passes the same conformance suite as M0’s SQLite adapter.
  6. Agent-authoring eval on 10 PAT-authed REST sources: 60% reach discover() working, 30% reach full conformance, on Claude Opus 4.7. Run the eval on every new model release.
  7. End-to-end OTel trace from a pipeline run viewable in SigNoz / Grafana Tempo.
  • Singer/Meltano compatibility shim (cut indefinitely per the data-engineering review)
  • DynamoDB catalog adapter (M2.5 if user demand)
  • OAuth refresh flow connectors (M2.5)
  • Hosted control plane (M3)
  • Distributed multi-node execution

M3 — fractalbox-hosted SaaS + commercial layer (open-ended)

Section titled “M3 — fractalbox-hosted SaaS + commercial layer (open-ended)”

Goal: a multi-tenant fractalbox-operated instance of the M1 team-mode control plane, plus the commercial scaffolding (billing, support tiers, attestation) needed for regulated buyers. OSS, opt-in, data plane stays with the customer. The single-tenant hakiri-control Rust binary and self-deployed CF variant from M1 remain available; M3 is the hosted/managed variant.

  • Multi-tenant control plane — extension of the M1 hakiri-control for multi-team operation. Per-tenant DOs, shared admin surface, tenant-scoped audit.
  • Centralized IdP — Google Workspace / Microsoft Entra / Okta SSO at the SaaS level. Self-hosted instances continue to bring their own.
  • Run-history retention — beyond what’s economical for single-tenant deploys.
  • Commercial onboarding — pricing, billing, support tiers. See 06-deployment.md § Topology 5.
  • SOC 2 + compliance attestation — scoped to the hosted SaaS instance (per 11-compliance.md). The OSS data plane is unaffected — attestation is operator/buyer work.
  • helm install hakiri-control chart — for teams that want Kubernetes-shaped operations of the self-hosted variant.
  1. A customer can helm install the self-hosted hakiri-control chart in their own Kubernetes cluster and run the same Electron / web clients against it.
  2. The fractalbox-hosted public instance at app.hakiri.dev runs the same code as the OSS chart (no closed-source upsell).
  3. The hosted control plane never reads customer pipeline data — only metadata, manifest snapshots, run summaries, audit events.
  4. A regulated customer can purchase a tier with SOC 2-scoped operations and receive a BAA (where applicable) without code modifications to the underlying OSS stack.

Run in parallel across milestones:

  • Documentation site: docs.hakiri.dev, MDX-based, generated examples that match the test suite. Start in M0; ship a real site in M1.
  • Eval harness: connector-authoring benchmark, run on every model release. Start in M1.
  • Connector inventory: a vetted list of community-authored connectors with conformance test results. Start in M1, expand throughout.
  • Performance benchmarks: nightly runs against synthetic 1M / 10M / 100M row sources to track regression. Start in M2.

These come up often but are not on the roadmap until evidence demands them:

  • Singer/Meltano compatibility shim — would drag Singer’s stdio JSONL throughput ceiling and brittle state semantics into the new design. Cut from M2 per the data-engineering review. Either invest in real connector ports or skip; the middle ground poisons the brand.
  • EFS-mounted SQLite catalog — unsafe on NFS (advisory locks, WAL-not-supported, ECS deploy-window races). See 06-deployment.md. RDS Postgres is the documented AWS catalog instead.
  • Streaming SQL / continuous queries
  • Pipeline DAGs with cross-pipeline dependencies
  • A connector marketplace with auth/billing
  • Reverse ETL UI
  • Distributed multi-node execution
  • Kafka source (user-authored connector or external Kafka→Hakiri shim)
  • Iceberg/Delta table format (Parquet + manifest covers v0)

Significant architecture decisions live as ADRs under specs/adr/. See adr/README.md for the index, status of each ADR, and the authoring template.