Skip to content

Collaborative config editing

Status: outline / RFC. Decisions are proposed, not final. Open questions at the bottom.

Related specs: 01-architecture.md, 03-pipelines.md, 06-deployment.md, 13-team-surfaces.md. Related ADRs: 0005, 0013, 0014.

How a team edits pipeline config concurrently across the Electron app and the web UI without rejected writes. The substrate is a Loro CRDT doc held in the team’s Durable Object (or hakiri-control Rust daemon for self-hosted), synced to all connected clients over a hibernating WebSocket, and materialized to versioned TOML snapshots in R2 on apply.

Daemons never speak CRDT. They read materialized TOML from R2 — the existing reconciliation path from 03-pipelines.md is unchanged. This spec is purely about the editing surface.

A team product means two teammates editing the same pipeline at the same moment is normal, not exceptional. Three models for handling that:

  1. Pessimistic locking — one person holds the edit lock; others wait. Wrong for laptop-class clients that may be offline; wrong UX for multiplayer.
  2. Optimistic concurrency — last commit wins, others get a “your edit was rejected, please refresh” error. Acceptable for low-frequency edits; painful when Alice tweaks a schedule while Bob renames a destination on the same pipeline.
  3. CRDT — concurrent edits merge automatically by field; same-field conflicts surface as structured UI affordances.

For pipeline config (mostly disjoint-field edits, occasional same-field), CRDT is the right shape. See ADR-0013 for the choice of Loro specifically.

Important scope note. This spec applies to collaborative config editing. It does not override ADR-0005, which rejected CRDTs for the data sync layer (run files, snapshots, cursors, schema-evolution decisions). Data sync remains LWW. The two scopes are orthogonal — the daemon-side reconciler never imports a CRDT library.

CRDT-merge semantics are safe for editable fields where “both changes are probably valid” is the right default. They’re unsafe for authorization fields where the answer must be exactly yes or no.

In the Loro docIn normal DB tables (hard authorization)
Pipeline definitionsTeam membership
Source / destination references (by id)RBAC, capability tokens, biscuit issuance
Schedules (cron, every, event triggers)Audit log (append-only stream)
Placement (cf:auto, node:alice-mbp, …)Run history, run status
Transform definitions (ordered list)Source secret references (the secrets themselves live in Keychain / Workers Secrets)
Schema overridesQuotas, billing
Pipeline groups / foldersPer-user UI preferences
Variable definitionsConnector capability grants

The split rule: collaborative editing belongs in Loro; authoritative state belongs in DB rows.

A user editing “Bob’s role” in a Team Settings UI is using the normal authenticated control-plane API, not the Loro channel. The visual distinction (a separate “Admin” surface in 13-team-surfaces.md) signals to users which fields have merge semantics and which don’t.

The doc is a top-level Loro container mirroring the manifest’s JSON-shaped form:

// pseudocode for the doc structure
LoroDoc {
pipelines: LoroMap<PipelineId, LoroMap<{
source: LoroMap<{ kind: string, ref: string }>,
destination: LoroMap<{ table: string, partition_by?: string }>,
schedule: LoroMap<{ kind: "cron" | "every" | "on-event", expr: string }>,
placement: LoroMap<{ kind: PlacementKind, target?: string }>,
transforms: LoroMovableList<TransformId>, // ordered, reorderable
schema: LoroMap<ColumnName, ColumnDef>,
enabled: boolean,
name: LoroText,
description: LoroText,
}>>,
transforms: LoroMap<TransformId, LoroMap<{
kind: "polars" | "wasm",
body: LoroText, // multi-line expression body
inputs: LoroMovableList<ColumnRef>,
}>>,
groups: LoroTree<GroupId>, // hierarchical pipeline folders
variables: LoroMap<string, LoroText>,
}

Type choices:

Loro typeWhere usedWhy
LoroMapKeyed records (pipelines, transforms, schedule fields)Supports concurrent field updates without conflict
LoroMovableListOrdered transforms, input column refsDrag-to-reorder produces stable move ops, not delete+insert
LoroTextPipeline name / description / transform bodyPreserves intra-string concurrent edits (multi-cursor friendly)
LoroTreePipeline group hierarchySupports reparenting without breaking sibling order

These four types cover every editable field. Automerge lacks a native MovableList equivalent — see ADR-0013.

sequenceDiagram
  participant U1 as User 1 (Electron)
  participant U2 as User 2 (Web)
  participant DO as Team DO / hakiri-control
  participant R2 as R2 bucket

  U1->>DO: WebSocket connect (auth: biscuit)
  DO-->>U1: full doc snapshot (Loro update)
  U2->>DO: WebSocket connect (auth: biscuit)
  DO-->>U2: full doc snapshot (Loro update)

  Note over U1,U2: live edits flow as Loro update messages

  U1->>U1: local edit (schedule "*/15 * * * *" → "0 * * * *")
  U1->>DO: Loro update msg (binary)
  DO->>DO: apply to canonical doc, persist
  DO-->>U2: broadcast Loro update
  U2->>U2: merge into local doc (no conflict)

  U2->>U2: concurrent edit on different field
  U2->>DO: Loro update msg
  DO->>DO: apply, persist
  DO-->>U1: broadcast

  Note over U1: "Apply" button clicked

  U1->>DO: POST /v1/manifest/apply { pipelines: ["gh-issues"] }
  DO->>DO: validate doc subset against JSON Schema
  DO->>R2: PUT manifest@v43.toml (snapshot)
  DO-->>U1: applied @v43
  DO-->>U2: broadcast: manifest applied @v43

Wire format: Loro’s binary update messages, framed as WebSocket binary frames. The DO is the canonical doc holder and the broadcast hub. Hibernating WebSocket support means idle connections cost nothing.

A client connecting fresh receives a full doc snapshot, then incremental updates. A client reconnecting after a brief offline period sends its known version vector; the DO responds with deltas since that point. A client offline long enough that the op log was compacted past its version receives a full doc snapshot — Loro handles this transparently.

Layered on top of the sync channel via a separate message type. Each client periodically sends an awareness message — its user id, the field it has focused, last-active timestamp — over the same WebSocket. The DO doesn’t persist awareness; it broadcasts to other connected clients with a 60s TTL.

This drives the “Bob is editing pipeline X” indicator in the UI. Loro’s awareness API is newer than Yjs’s — plan to layer this in hakiri-control rather than expect it free.

The flow has three boundaries with different validation guarantees:

BoundaryWhat happensValidation
EditUser types; Loro ops produced; broadcast to other clientsNone — intermediate states may be invalid (half-typed cron expression)
ApplyUser clicks “Apply” on a pipeline or pipeline groupJSON Schema validation; rejected if invalid; sub-doc materialized to TOML snapshot in R2
ReconcileDaemon polls R2, sees new snapshot, picks up changesTOML deserialization + runtime invariants (cursor compatibility, etc.)

No write rejection during edits. Two users editing the same pipeline simultaneously merge automatically by Loro semantics:

  • Different fields → both edits apply, no conflict surface.
  • Same field, same time → Loro chooses one deterministically by op id; the UI shows “Bob also changed schedule 2s ago — Alice’s value won, click to see Bob’s” as a non-blocking affordance.

Apply can be scoped:

  • Apply pipeline — one pipeline goes live; everything else stays in edit state.
  • Apply group — all pipelines under a group go live atomically (one TOML snapshot).
  • Apply all — the whole doc becomes a new snapshot.

This means the team can stage a coordinated change across multiple pipelines and ship them together.

Some fields don’t affect runtime: pipeline display name, description, tags, group membership, variable comments. These auto-apply continuously: the control plane debounces and snapshots them into the latest manifest on a 5s window.

Runtime-affecting fields require explicit Apply: schedule, placement, source / destination references, transforms, schema, enabled state. The UI shows pipelines with unapplied runtime changes with a visible “Apply changes” affordance.

This split mirrors how IDEs treat “rename a local variable” (live) vs “change a function signature” (deploy boundary).

The Loro op log is the granular history; named snapshots are the human-readable revisions surface.

flowchart LR
  Edit1[op 1] --> Edit2[op 2] --> Snap1["@v42 — apply
manifest.toml"] --> Edit3[op 3] --> Edit4[op 4] --> Snap2["@v43 — apply
manifest.toml"]
SurfaceWhat it shows
Loro op log (internal)Every keystroke-shaped change, retained between snapshots; compacted at each snapshot
Revisions UI (@v42, @v43)Each snapshot is a versioned TOML file in R2; diff-able, git-able, downloadable
”Revert to @v42”Replay snapshot into the Loro doc as a single update; new snapshot created

The TOML snapshot is the source of truth for the daemon. The Loro doc is the source of truth for the editing surface. They synchronize at apply boundaries.

The op log is compacted on every named snapshot — ops older than the previous snapshot are dropped. Snapshots themselves are retained indefinitely (small TOML files; R2 storage is cheap). Older snapshots can be archived to a cold bucket after N revisions if storage becomes an issue.

Standard diff manifest@v42.toml manifest@v43.toml works because snapshots are plain TOML. The UI shows a structured diff (added pipelines highlighted green, removed red, changed fields side-by-side) by parsing both TOMLs and walking the tree. Power users can git diff snapshots fetched directly from R2.

Daemon side — TOML consumption unchanged

Section titled “Daemon side — TOML consumption unchanged”

Daemons never speak CRDT. The reconciliation loop is:

  1. Daemon polls R2 manifest path (or receives SSE notification from control plane).
  2. Reads manifest@<latest>.toml.
  3. Diffs against last-applied version.
  4. Reconciles state: starts new pipelines, stops removed ones, updates schedules, etc.

From the daemon’s perspective, the editing surface is invisible. It sees a TOML file change — like a git pull of a config repo. This means:

  • Existing daemon code from 03-pipelines.md and 06-deployment.md needs no changes for the CRDT surface to land.
  • Self-hosted teams who skip the control plane entirely (CLI-only, git-managed manifests) work exactly as before — the CRDT surface is an opt-in editing layer, not a required substrate.
WhereWhat
Durable Object (CF) or SQLite (self-hosted hakiri-control)Canonical Loro doc binary, awareness state (ephemeral), op log between snapshots
R2 bucketNamed TOML snapshots (manifest@v42.toml, manifest@v43.toml, …), index file (manifest-revisions.json)
Browser session / Electron KeychainLocal Loro doc replica, pending unsent ops

The DO storage holds the authoritative doc state. R2 holds the daemon-consumed materialization. The split mirrors the spec’s general pattern of “engine consumes boring storage; editing surface uses a richer model.”

The hakiri-control Rust binary (M1) provides the same surface for teams that can’t use Cloudflare:

  • WebSocket server via axum + tokio-tungstenite
  • Embedded Loro core via the loro Rust crate
  • Local SQLite for canonical doc + auth state
  • Local filesystem or S3-compatible bucket for snapshots

The wire protocol (Loro update messages over WebSocket frames) is identical, so the same Electron / web client connects to either backend.

Concurrent editLoro outcomeUI surface
Different fields of same pipelineBoth apply, deterministic orderSilent merge
Same field, different valuesOne wins by Loro op id ordering”Both Alice and Bob edited schedule; Alice’s value (newer) shown, click to see Bob’s” — non-blocking banner
One adds a transform, other removes a different transformBoth applySilent merge
Both add a transform at the same positionBoth added in deterministic orderSilent merge
One reorders transforms, other adds a new oneReorder applies, new one inserted at intended positionSilent merge
Both delete the same pipelineIdempotent — one tombstoneSilent merge
One edits a field, other deletes the parent pipelineDelete wins; field edit silently dropped”Bob’s edit to schedule was discarded — Alice deleted the pipeline 3s ago”
Two users type in the same LoroText bodyMulti-cursor merge (Loro’s text CRDT)Cursors visible via awareness; merge silent
  • Op-log retention beyond compaction. Should we keep a separate, long-retention op stream for audit / time-travel (“what was the schedule at 14:30 on 2026-04-12”)? Or rely solely on named snapshots? Leaning snapshots-only for v0; granular op retention is a future feature if audit demands sub-second granularity.
  • Schema migration of the Loro doc. If we change the doc shape (add a new field type, rename a field), how do existing docs upgrade? Loro doesn’t have native schema migration; we’ll write migration functions in hakiri-control that run on doc load. Worth a documented procedure before M1 ships.
  • Maximum doc size. A team with thousands of pipelines × tens of transforms each could produce a large Loro doc. At what doc size does sync latency exceed 100ms? Plan a perf spike in M1 against representative workloads (per ADR-0013).
  • Offline editing on the Electron app. A laptop offline for an hour, then reconnects with 50 local edits — does the merge UX hold up? Needs a soak test in M1.
  • Snapshot debounce timing for auto-apply fields. 5s feels right for “renamed pipeline” but maybe wrong for “edited tag.” Tune with real usage.
  • Multi-doc per team. Currently one doc per team. If a team grows past Loro’s comfortable doc size, do we shard by pipeline group? Defer to evidence.
  • Browser-tab conflict. Same user, two browser tabs, divergent edits. Loro merges fine but the user might be surprised. Consider tab-coordination via BroadcastChannel API.
  • Loro version mismatch handling. Server upgrades Loro to a wire-incompatible major version while older clients are still connected. Plan: control plane advertises supported version range; clients with stale Loro bundles get a hard “please update” prompt rather than silent partial compatibility.