Collaborative config editing

Status: outline / RFC. Decisions are proposed, not final. Open questions at the bottom.

Related specs: 01-architecture.md, 03-pipelines.md, 06-deployment.md, 13-team-surfaces.md. Related ADRs: 0005, 0013, 0014.

How a team edits pipeline config concurrently across the Electron app and the web UI without rejected writes. The substrate is a Loro CRDT doc held in the team’s Durable Object (or hakiri-control Rust daemon for self-hosted), synced to all connected clients over a hibernating WebSocket, and materialized to versioned TOML snapshots in R2 on apply.

Daemons never speak CRDT. They read materialized TOML from R2 — the existing reconciliation path from 03-pipelines.md is unchanged. This spec is purely about the editing surface.

Why CRDT on config

A team product means two teammates editing the same pipeline at the same moment is normal, not exceptional. Three models for handling that:

Pessimistic locking — one person holds the edit lock; others wait. Wrong for laptop-class clients that may be offline; wrong UX for multiplayer.
Optimistic concurrency — last commit wins, others get a “your edit was rejected, please refresh” error. Acceptable for low-frequency edits; painful when Alice tweaks a schedule while Bob renames a destination on the same pipeline.
CRDT — concurrent edits merge automatically by field; same-field conflicts surface as structured UI affordances.

For pipeline config (mostly disjoint-field edits, occasional same-field), CRDT is the right shape. See ADR-0013 for the choice of Loro specifically.

Important scope note. This spec applies to collaborative config editing. It does not override ADR-0005, which rejected CRDTs for the data sync layer (run files, snapshots, cursors, schema-evolution decisions). Data sync remains LWW. The two scopes are orthogonal — the daemon-side reconciler never imports a CRDT library.

What’s in the CRDT doc

CRDT-merge semantics are safe for editable fields where “both changes are probably valid” is the right default. They’re unsafe for authorization fields where the answer must be exactly yes or no.

In the Loro doc	In normal DB tables (hard authorization)
Pipeline definitions	Team membership
Source / destination references (by id)	RBAC, capability tokens, biscuit issuance
Schedules (`cron`, `every`, event triggers)	Audit log (append-only stream)
Placement (`cf:auto`, `node:alice-mbp`, …)	Run history, run status
Transform definitions (ordered list)	Source secret references (the secrets themselves live in Keychain / Workers Secrets)
Schema overrides	Quotas, billing
Pipeline groups / folders	Per-user UI preferences
Variable definitions	Connector capability grants

The split rule: collaborative editing belongs in Loro; authoritative state belongs in DB rows.

A user editing “Bob’s role” in a Team Settings UI is using the normal authenticated control-plane API, not the Loro channel. The visual distinction (a separate “Admin” surface in 13-team-surfaces.md) signals to users which fields have merge semantics and which don’t.

Loro doc shape

The doc is a top-level Loro container mirroring the manifest’s JSON-shaped form:

// pseudocode for the doc structure
LoroDoc {
  pipelines: LoroMap<PipelineId, LoroMap<{
    source:      LoroMap<{ kind: string, ref: string }>,
    destination: LoroMap<{ table: string, partition_by?: string }>,
    schedule:    LoroMap<{ kind: "cron" | "every" | "on-event", expr: string }>,
    placement:   LoroMap<{ kind: PlacementKind, target?: string }>,
    transforms:  LoroMovableList<TransformId>,   // ordered, reorderable
    schema:      LoroMap<ColumnName, ColumnDef>,
    enabled:     boolean,
    name:        LoroText,
    description: LoroText,
  }>>,

  transforms: LoroMap<TransformId, LoroMap<{
    kind: "polars" | "wasm",
    body: LoroText,   // multi-line expression body
    inputs: LoroMovableList<ColumnRef>,
  }>>,

  groups:    LoroTree<GroupId>,            // hierarchical pipeline folders
  variables: LoroMap<string, LoroText>,
}

Type choices:

Loro type	Where used	Why
`LoroMap`	Keyed records (pipelines, transforms, schedule fields)	Supports concurrent field updates without conflict
`LoroMovableList`	Ordered transforms, input column refs	Drag-to-reorder produces stable move ops, not delete+insert
`LoroText`	Pipeline name / description / transform body	Preserves intra-string concurrent edits (multi-cursor friendly)
`LoroTree`	Pipeline group hierarchy	Supports reparenting without breaking sibling order

These four types cover every editable field. Automerge lacks a native MovableList equivalent — see ADR-0013.

Sync protocol

sequenceDiagram
  participant U1 as User 1 (Electron)
  participant U2 as User 2 (Web)
  participant DO as Team DO / hakiri-control
  participant R2 as R2 bucket

  U1->>DO: WebSocket connect (auth: biscuit)
  DO-->>U1: full doc snapshot (Loro update)
  U2->>DO: WebSocket connect (auth: biscuit)
  DO-->>U2: full doc snapshot (Loro update)

  Note over U1,U2: live edits flow as Loro update messages

  U1->>U1: local edit (schedule "*/15 * * * *" → "0 * * * *")
  U1->>DO: Loro update msg (binary)
  DO->>DO: apply to canonical doc, persist
  DO-->>U2: broadcast Loro update
  U2->>U2: merge into local doc (no conflict)

  U2->>U2: concurrent edit on different field
  U2->>DO: Loro update msg
  DO->>DO: apply, persist
  DO-->>U1: broadcast

  Note over U1: "Apply" button clicked

  U1->>DO: POST /v1/manifest/apply { pipelines: ["gh-issues"] }
  DO->>DO: validate doc subset against JSON Schema
  DO->>R2: PUT manifest@v43.toml (snapshot)
  DO-->>U1: applied @v43
  DO-->>U2: broadcast: manifest applied @v43

Wire format: Loro’s binary update messages, framed as WebSocket binary frames. The DO is the canonical doc holder and the broadcast hub. Hibernating WebSocket support means idle connections cost nothing.

Initial sync

A client connecting fresh receives a full doc snapshot, then incremental updates. A client reconnecting after a brief offline period sends its known version vector; the DO responds with deltas since that point. A client offline long enough that the op log was compacted past its version receives a full doc snapshot — Loro handles this transparently.

Awareness / presence

Layered on top of the sync channel via a separate message type. Each client periodically sends an awareness message — its user id, the field it has focused, last-active timestamp — over the same WebSocket. The DO doesn’t persist awareness; it broadcasts to other connected clients with a 60s TTL.

This drives the “Bob is editing pipeline X” indicator in the UI. Loro’s awareness API is newer than Yjs’s — plan to layer this in hakiri-control rather than expect it free.

Edit → snapshot → apply flow

The flow has three boundaries with different validation guarantees:

Boundary	What happens	Validation
Edit	User types; Loro ops produced; broadcast to other clients	None — intermediate states may be invalid (half-typed cron expression)
Apply	User clicks “Apply” on a pipeline or pipeline group	JSON Schema validation; rejected if invalid; sub-doc materialized to TOML snapshot in R2
Reconcile	Daemon polls R2, sees new snapshot, picks up changes	TOML deserialization + runtime invariants (cursor compatibility, etc.)

No write rejection during edits. Two users editing the same pipeline simultaneously merge automatically by Loro semantics:

Different fields → both edits apply, no conflict surface.
Same field, same time → Loro chooses one deterministically by op id; the UI shows “Bob also changed schedule 2s ago — Alice’s value won, click to see Bob’s” as a non-blocking affordance.

Apply scope

Apply can be scoped:

Apply pipeline — one pipeline goes live; everything else stays in edit state.
Apply group — all pipelines under a group go live atomically (one TOML snapshot).
Apply all — the whole doc becomes a new snapshot.

This means the team can stage a coordinated change across multiple pipelines and ship them together.

Auto-apply vs explicit apply

Some fields don’t affect runtime: pipeline display name, description, tags, group membership, variable comments. These auto-apply continuously: the control plane debounces and snapshots them into the latest manifest on a 5s window.

Runtime-affecting fields require explicit Apply: schedule, placement, source / destination references, transforms, schema, enabled state. The UI shows pipelines with unapplied runtime changes with a visible “Apply changes” affordance.

This split mirrors how IDEs treat “rename a local variable” (live) vs “change a function signature” (deploy boundary).

Snapshots, history, revisions

The Loro op log is the granular history; named snapshots are the human-readable revisions surface.

flowchart LR
  Edit1[op 1] --> Edit2[op 2] --> Snap1["@v42 — apply
manifest.toml"] --> Edit3[op 3] --> Edit4[op 4] --> Snap2["@v43 — apply
manifest.toml"]

Surface	What it shows
Loro op log (internal)	Every keystroke-shaped change, retained between snapshots; compacted at each snapshot
Revisions UI (`@v42`, `@v43`)	Each snapshot is a versioned TOML file in R2; diff-able, `git`-able, downloadable
”Revert to @v42”	Replay snapshot into the Loro doc as a single update; new snapshot created

The TOML snapshot is the source of truth for the daemon. The Loro doc is the source of truth for the editing surface. They synchronize at apply boundaries.

Retention

The op log is compacted on every named snapshot — ops older than the previous snapshot are dropped. Snapshots themselves are retained indefinitely (small TOML files; R2 storage is cheap). Older snapshots can be archived to a cold bucket after N revisions if storage becomes an issue.

Diffing snapshots

Standard diff manifest@v42.toml manifest@v43.toml works because snapshots are plain TOML. The UI shows a structured diff (added pipelines highlighted green, removed red, changed fields side-by-side) by parsing both TOMLs and walking the tree. Power users can git diff snapshots fetched directly from R2.

Daemon side — TOML consumption unchanged

Daemons never speak CRDT. The reconciliation loop is:

Daemon polls R2 manifest path (or receives SSE notification from control plane).
Reads manifest@<latest>.toml.
Diffs against last-applied version.
Reconciles state: starts new pipelines, stops removed ones, updates schedules, etc.

From the daemon’s perspective, the editing surface is invisible. It sees a TOML file change — like a git pull of a config repo. This means:

Existing daemon code from 03-pipelines.md and 06-deployment.md needs no changes for the CRDT surface to land.
Self-hosted teams who skip the control plane entirely (CLI-only, git-managed manifests) work exactly as before — the CRDT surface is an opt-in editing layer, not a required substrate.

Storage

Where	What
Durable Object (CF) or SQLite (self-hosted `hakiri-control`)	Canonical Loro doc binary, awareness state (ephemeral), op log between snapshots
R2 bucket	Named TOML snapshots (`manifest@v42.toml`, `manifest@v43.toml`, …), index file (`manifest-revisions.json`)
Browser session / Electron Keychain	Local Loro doc replica, pending unsent ops

The DO storage holds the authoritative doc state. R2 holds the daemon-consumed materialization. The split mirrors the spec’s general pattern of “engine consumes boring storage; editing surface uses a richer model.”

Self-hosted parity

The hakiri-control Rust binary (M1) provides the same surface for teams that can’t use Cloudflare:

WebSocket server via axum + tokio-tungstenite
Embedded Loro core via the loro Rust crate
Local SQLite for canonical doc + auth state
Local filesystem or S3-compatible bucket for snapshots

The wire protocol (Loro update messages over WebSocket frames) is identical, so the same Electron / web client connects to either backend.

Conflict semantics

Concurrent edit	Loro outcome	UI surface
Different fields of same pipeline	Both apply, deterministic order	Silent merge
Same field, different values	One wins by Loro op id ordering	”Both Alice and Bob edited `schedule`; Alice’s value (newer) shown, click to see Bob’s” — non-blocking banner
One adds a transform, other removes a different transform	Both apply	Silent merge
Both add a transform at the same position	Both added in deterministic order	Silent merge
One reorders transforms, other adds a new one	Reorder applies, new one inserted at intended position	Silent merge
Both delete the same pipeline	Idempotent — one tombstone	Silent merge
One edits a field, other deletes the parent pipeline	Delete wins; field edit silently dropped	”Bob’s edit to `schedule` was discarded — Alice deleted the pipeline 3s ago”
Two users type in the same `LoroText` body	Multi-cursor merge (Loro’s text CRDT)	Cursors visible via awareness; merge silent

Open questions

Op-log retention beyond compaction. Should we keep a separate, long-retention op stream for audit / time-travel (“what was the schedule at 14:30 on 2026-04-12”)? Or rely solely on named snapshots? Leaning snapshots-only for v0; granular op retention is a future feature if audit demands sub-second granularity.
Schema migration of the Loro doc. If we change the doc shape (add a new field type, rename a field), how do existing docs upgrade? Loro doesn’t have native schema migration; we’ll write migration functions in hakiri-control that run on doc load. Worth a documented procedure before M1 ships.
Maximum doc size. A team with thousands of pipelines × tens of transforms each could produce a large Loro doc. At what doc size does sync latency exceed 100ms? Plan a perf spike in M1 against representative workloads (per ADR-0013).
Offline editing on the Electron app. A laptop offline for an hour, then reconnects with 50 local edits — does the merge UX hold up? Needs a soak test in M1.
Snapshot debounce timing for auto-apply fields. 5s feels right for “renamed pipeline” but maybe wrong for “edited tag.” Tune with real usage.
Multi-doc per team. Currently one doc per team. If a team grows past Loro’s comfortable doc size, do we shard by pipeline group? Defer to evidence.
Browser-tab conflict. Same user, two browser tabs, divergent edits. Loro merges fine but the user might be surprised. Consider tab-coordination via BroadcastChannel API.
Loro version mismatch handling. Server upgrades Loro to a wire-incompatible major version while older clients are still connected. Plan: control plane advertises supported version range; clients with stale Loro bundles get a hard “please update” prompt rather than silent partial compatibility.