Collaborative config editing
Status: outline / RFC. Decisions are proposed, not final. Open questions at the bottom.
Related specs: 01-architecture.md, 03-pipelines.md, 06-deployment.md, 13-team-surfaces.md. Related ADRs: 0005, 0013, 0014.
How a team edits pipeline config concurrently across the Electron app and the web UI without rejected writes. The substrate is a Loro CRDT doc held in the team’s Durable Object (or hakiri-control Rust daemon for self-hosted), synced to all connected clients over a hibernating WebSocket, and materialized to versioned TOML snapshots in R2 on apply.
Daemons never speak CRDT. They read materialized TOML from R2 — the existing reconciliation path from 03-pipelines.md is unchanged. This spec is purely about the editing surface.
Why CRDT on config
Section titled “Why CRDT on config”A team product means two teammates editing the same pipeline at the same moment is normal, not exceptional. Three models for handling that:
- Pessimistic locking — one person holds the edit lock; others wait. Wrong for laptop-class clients that may be offline; wrong UX for multiplayer.
- Optimistic concurrency — last commit wins, others get a “your edit was rejected, please refresh” error. Acceptable for low-frequency edits; painful when Alice tweaks a schedule while Bob renames a destination on the same pipeline.
- CRDT — concurrent edits merge automatically by field; same-field conflicts surface as structured UI affordances.
For pipeline config (mostly disjoint-field edits, occasional same-field), CRDT is the right shape. See ADR-0013 for the choice of Loro specifically.
Important scope note. This spec applies to collaborative config editing. It does not override ADR-0005, which rejected CRDTs for the data sync layer (run files, snapshots, cursors, schema-evolution decisions). Data sync remains LWW. The two scopes are orthogonal — the daemon-side reconciler never imports a CRDT library.
What’s in the CRDT doc
Section titled “What’s in the CRDT doc”CRDT-merge semantics are safe for editable fields where “both changes are probably valid” is the right default. They’re unsafe for authorization fields where the answer must be exactly yes or no.
| In the Loro doc | In normal DB tables (hard authorization) |
|---|---|
| Pipeline definitions | Team membership |
| Source / destination references (by id) | RBAC, capability tokens, biscuit issuance |
Schedules (cron, every, event triggers) | Audit log (append-only stream) |
Placement (cf:auto, node:alice-mbp, …) | Run history, run status |
| Transform definitions (ordered list) | Source secret references (the secrets themselves live in Keychain / Workers Secrets) |
| Schema overrides | Quotas, billing |
| Pipeline groups / folders | Per-user UI preferences |
| Variable definitions | Connector capability grants |
The split rule: collaborative editing belongs in Loro; authoritative state belongs in DB rows.
A user editing “Bob’s role” in a Team Settings UI is using the normal authenticated control-plane API, not the Loro channel. The visual distinction (a separate “Admin” surface in 13-team-surfaces.md) signals to users which fields have merge semantics and which don’t.
Loro doc shape
Section titled “Loro doc shape”The doc is a top-level Loro container mirroring the manifest’s JSON-shaped form:
// pseudocode for the doc structureLoroDoc { pipelines: LoroMap<PipelineId, LoroMap<{ source: LoroMap<{ kind: string, ref: string }>, destination: LoroMap<{ table: string, partition_by?: string }>, schedule: LoroMap<{ kind: "cron" | "every" | "on-event", expr: string }>, placement: LoroMap<{ kind: PlacementKind, target?: string }>, transforms: LoroMovableList<TransformId>, // ordered, reorderable schema: LoroMap<ColumnName, ColumnDef>, enabled: boolean, name: LoroText, description: LoroText, }>>,
transforms: LoroMap<TransformId, LoroMap<{ kind: "polars" | "wasm", body: LoroText, // multi-line expression body inputs: LoroMovableList<ColumnRef>, }>>,
groups: LoroTree<GroupId>, // hierarchical pipeline folders variables: LoroMap<string, LoroText>,}Type choices:
| Loro type | Where used | Why |
|---|---|---|
LoroMap | Keyed records (pipelines, transforms, schedule fields) | Supports concurrent field updates without conflict |
LoroMovableList | Ordered transforms, input column refs | Drag-to-reorder produces stable move ops, not delete+insert |
LoroText | Pipeline name / description / transform body | Preserves intra-string concurrent edits (multi-cursor friendly) |
LoroTree | Pipeline group hierarchy | Supports reparenting without breaking sibling order |
These four types cover every editable field. Automerge lacks a native MovableList equivalent — see ADR-0013.
Sync protocol
Section titled “Sync protocol”sequenceDiagram
participant U1 as User 1 (Electron)
participant U2 as User 2 (Web)
participant DO as Team DO / hakiri-control
participant R2 as R2 bucket
U1->>DO: WebSocket connect (auth: biscuit)
DO-->>U1: full doc snapshot (Loro update)
U2->>DO: WebSocket connect (auth: biscuit)
DO-->>U2: full doc snapshot (Loro update)
Note over U1,U2: live edits flow as Loro update messages
U1->>U1: local edit (schedule "*/15 * * * *" → "0 * * * *")
U1->>DO: Loro update msg (binary)
DO->>DO: apply to canonical doc, persist
DO-->>U2: broadcast Loro update
U2->>U2: merge into local doc (no conflict)
U2->>U2: concurrent edit on different field
U2->>DO: Loro update msg
DO->>DO: apply, persist
DO-->>U1: broadcast
Note over U1: "Apply" button clicked
U1->>DO: POST /v1/manifest/apply { pipelines: ["gh-issues"] }
DO->>DO: validate doc subset against JSON Schema
DO->>R2: PUT manifest@v43.toml (snapshot)
DO-->>U1: applied @v43
DO-->>U2: broadcast: manifest applied @v43
Wire format: Loro’s binary update messages, framed as WebSocket binary frames. The DO is the canonical doc holder and the broadcast hub. Hibernating WebSocket support means idle connections cost nothing.
Initial sync
Section titled “Initial sync”A client connecting fresh receives a full doc snapshot, then incremental updates. A client reconnecting after a brief offline period sends its known version vector; the DO responds with deltas since that point. A client offline long enough that the op log was compacted past its version receives a full doc snapshot — Loro handles this transparently.
Awareness / presence
Section titled “Awareness / presence”Layered on top of the sync channel via a separate message type. Each client periodically sends an awareness message — its user id, the field it has focused, last-active timestamp — over the same WebSocket. The DO doesn’t persist awareness; it broadcasts to other connected clients with a 60s TTL.
This drives the “Bob is editing pipeline X” indicator in the UI. Loro’s awareness API is newer than Yjs’s — plan to layer this in hakiri-control rather than expect it free.
Edit → snapshot → apply flow
Section titled “Edit → snapshot → apply flow”The flow has three boundaries with different validation guarantees:
| Boundary | What happens | Validation |
|---|---|---|
| Edit | User types; Loro ops produced; broadcast to other clients | None — intermediate states may be invalid (half-typed cron expression) |
| Apply | User clicks “Apply” on a pipeline or pipeline group | JSON Schema validation; rejected if invalid; sub-doc materialized to TOML snapshot in R2 |
| Reconcile | Daemon polls R2, sees new snapshot, picks up changes | TOML deserialization + runtime invariants (cursor compatibility, etc.) |
No write rejection during edits. Two users editing the same pipeline simultaneously merge automatically by Loro semantics:
- Different fields → both edits apply, no conflict surface.
- Same field, same time → Loro chooses one deterministically by op id; the UI shows “Bob also changed
schedule2s ago — Alice’s value won, click to see Bob’s” as a non-blocking affordance.
Apply scope
Section titled “Apply scope”Apply can be scoped:
- Apply pipeline — one pipeline goes live; everything else stays in edit state.
- Apply group — all pipelines under a group go live atomically (one TOML snapshot).
- Apply all — the whole doc becomes a new snapshot.
This means the team can stage a coordinated change across multiple pipelines and ship them together.
Auto-apply vs explicit apply
Section titled “Auto-apply vs explicit apply”Some fields don’t affect runtime: pipeline display name, description, tags, group membership, variable comments. These auto-apply continuously: the control plane debounces and snapshots them into the latest manifest on a 5s window.
Runtime-affecting fields require explicit Apply: schedule, placement, source / destination references, transforms, schema, enabled state. The UI shows pipelines with unapplied runtime changes with a visible “Apply changes” affordance.
This split mirrors how IDEs treat “rename a local variable” (live) vs “change a function signature” (deploy boundary).
Snapshots, history, revisions
Section titled “Snapshots, history, revisions”The Loro op log is the granular history; named snapshots are the human-readable revisions surface.
flowchart LR Edit1[op 1] --> Edit2[op 2] --> Snap1["@v42 — apply manifest.toml"] --> Edit3[op 3] --> Edit4[op 4] --> Snap2["@v43 — apply manifest.toml"]
| Surface | What it shows |
|---|---|
| Loro op log (internal) | Every keystroke-shaped change, retained between snapshots; compacted at each snapshot |
Revisions UI (@v42, @v43) | Each snapshot is a versioned TOML file in R2; diff-able, git-able, downloadable |
| ”Revert to @v42” | Replay snapshot into the Loro doc as a single update; new snapshot created |
The TOML snapshot is the source of truth for the daemon. The Loro doc is the source of truth for the editing surface. They synchronize at apply boundaries.
Retention
Section titled “Retention”The op log is compacted on every named snapshot — ops older than the previous snapshot are dropped. Snapshots themselves are retained indefinitely (small TOML files; R2 storage is cheap). Older snapshots can be archived to a cold bucket after N revisions if storage becomes an issue.
Diffing snapshots
Section titled “Diffing snapshots”Standard diff manifest@v42.toml manifest@v43.toml works because snapshots are plain TOML. The UI shows a structured diff (added pipelines highlighted green, removed red, changed fields side-by-side) by parsing both TOMLs and walking the tree. Power users can git diff snapshots fetched directly from R2.
Daemon side — TOML consumption unchanged
Section titled “Daemon side — TOML consumption unchanged”Daemons never speak CRDT. The reconciliation loop is:
- Daemon polls R2 manifest path (or receives SSE notification from control plane).
- Reads
manifest@<latest>.toml. - Diffs against last-applied version.
- Reconciles state: starts new pipelines, stops removed ones, updates schedules, etc.
From the daemon’s perspective, the editing surface is invisible. It sees a TOML file change — like a git pull of a config repo. This means:
- Existing daemon code from 03-pipelines.md and 06-deployment.md needs no changes for the CRDT surface to land.
- Self-hosted teams who skip the control plane entirely (CLI-only,
git-managed manifests) work exactly as before — the CRDT surface is an opt-in editing layer, not a required substrate.
Storage
Section titled “Storage”| Where | What |
|---|---|
Durable Object (CF) or SQLite (self-hosted hakiri-control) | Canonical Loro doc binary, awareness state (ephemeral), op log between snapshots |
| R2 bucket | Named TOML snapshots (manifest@v42.toml, manifest@v43.toml, …), index file (manifest-revisions.json) |
| Browser session / Electron Keychain | Local Loro doc replica, pending unsent ops |
The DO storage holds the authoritative doc state. R2 holds the daemon-consumed materialization. The split mirrors the spec’s general pattern of “engine consumes boring storage; editing surface uses a richer model.”
Self-hosted parity
Section titled “Self-hosted parity”The hakiri-control Rust binary (M1) provides the same surface for teams that can’t use Cloudflare:
- WebSocket server via
axum+tokio-tungstenite - Embedded Loro core via the
loroRust crate - Local SQLite for canonical doc + auth state
- Local filesystem or S3-compatible bucket for snapshots
The wire protocol (Loro update messages over WebSocket frames) is identical, so the same Electron / web client connects to either backend.
Conflict semantics
Section titled “Conflict semantics”| Concurrent edit | Loro outcome | UI surface |
|---|---|---|
| Different fields of same pipeline | Both apply, deterministic order | Silent merge |
| Same field, different values | One wins by Loro op id ordering | ”Both Alice and Bob edited schedule; Alice’s value (newer) shown, click to see Bob’s” — non-blocking banner |
| One adds a transform, other removes a different transform | Both apply | Silent merge |
| Both add a transform at the same position | Both added in deterministic order | Silent merge |
| One reorders transforms, other adds a new one | Reorder applies, new one inserted at intended position | Silent merge |
| Both delete the same pipeline | Idempotent — one tombstone | Silent merge |
| One edits a field, other deletes the parent pipeline | Delete wins; field edit silently dropped | ”Bob’s edit to schedule was discarded — Alice deleted the pipeline 3s ago” |
Two users type in the same LoroText body | Multi-cursor merge (Loro’s text CRDT) | Cursors visible via awareness; merge silent |
Open questions
Section titled “Open questions”- Op-log retention beyond compaction. Should we keep a separate, long-retention op stream for audit / time-travel (“what was the schedule at 14:30 on 2026-04-12”)? Or rely solely on named snapshots? Leaning snapshots-only for v0; granular op retention is a future feature if audit demands sub-second granularity.
- Schema migration of the Loro doc. If we change the doc shape (add a new field type, rename a field), how do existing docs upgrade? Loro doesn’t have native schema migration; we’ll write migration functions in
hakiri-controlthat run on doc load. Worth a documented procedure before M1 ships. - Maximum doc size. A team with thousands of pipelines × tens of transforms each could produce a large Loro doc. At what doc size does sync latency exceed 100ms? Plan a perf spike in M1 against representative workloads (per ADR-0013).
- Offline editing on the Electron app. A laptop offline for an hour, then reconnects with 50 local edits — does the merge UX hold up? Needs a soak test in M1.
- Snapshot debounce timing for auto-apply fields. 5s feels right for “renamed pipeline” but maybe wrong for “edited tag.” Tune with real usage.
- Multi-doc per team. Currently one doc per team. If a team grows past Loro’s comfortable doc size, do we shard by pipeline group? Defer to evidence.
- Browser-tab conflict. Same user, two browser tabs, divergent edits. Loro merges fine but the user might be surprised. Consider tab-coordination via BroadcastChannel API.
- Loro version mismatch handling. Server upgrades Loro to a wire-incompatible major version while older clients are still connected. Plan: control plane advertises supported version range; clients with stale Loro bundles get a hard “please update” prompt rather than silent partial compatibility.