Agent Interface
The agent interface is the differentiator: an LLM agent should be able to (1) discover the available capabilities, (2) author a new connector against an unfamiliar API, (3) safely test and install it, and (4) observe its runs — all through a stable contract.
Surface: MCP server, ships in the binary
Section titled “Surface: MCP server, ships in the binary”hakiri mcp starts a Model Context Protocol server over stdio (for Claude Code et al.) or HTTP+SSE (for hosted clients). All capabilities listed below are exposed as MCP tools.
# stdio (Claude Code / desktop clients)hakiri mcp
# HTTP (hosted/remote agents)hakiri mcp --http 0.0.0.0:7700 --token $TOKENDiscovery
Section titled “Discovery”| Tool | Input | Output | Use |
|---|---|---|---|
catalog.list_connectors | { kind?: "source" | "destination" } | list of { name, version, kind, summary } | What can I use? |
catalog.describe_connector | { name } | full WIT-derived metadata + config schema | What does this take? |
pipeline.list | — | list of { id, source, tables, schedule, last_run } | What’s already wired? |
pipeline.describe | { id } | full manifest + last run trace | Tell me about this pipeline |
Schema introspection
Section titled “Schema introspection”| Tool | Input | Output | Use |
|---|---|---|---|
source.list_tables | { connector, config } | list of { name, summary, row_count_estimate? } | What tables does this source expose? Concise — fits in tokens. |
source.sample_table | { connector, config, table, n?: 10 } | { schema, rows: [...] } | What does this table look like? Schema + n example rows. |
source.discover | { connector, config, table? } | full table schemas (paginated if large) | Deep discovery — call after list_tables narrows the surface. |
context.tables | — | list of tables with row counts and last_updated | What’s in my context store? |
context.schema | { table } | Arrow schema JSON + evolution history | What columns, and how have they changed? |
| Tool | Input | Output | Use |
|---|---|---|---|
context.query | { sql, limit?: number } | result rows (capped) | Read from the context store |
context.explain | { sql } | DuckDB EXPLAIN | Is this query reasonable? |
Authoring
Section titled “Authoring”| Tool | Input | Output | Use |
|---|---|---|---|
connector.scaffold | { name, kind, spec_url? | openapi? | sql? } | path to generated crate | Generate a connector skeleton |
connector.build | { name } | build status + warnings | Compile WIT → WASM |
connector.test | { name, fixtures?: string[] } | test run summary | Verify against contract |
connector.install | { name, version } | install status | Register into the project catalog |
pipeline.create | { id, source, tables, schedule?, format?: "json" | "toml" } | manifest path | Writes pipelines/<id>.json by default; format: "toml" appends a [[pipeline]] block to hakiri.toml |
pipeline.edit | { id, patch } | new manifest content | Modify an existing pipeline; preserves the file’s format (TOML round-trips via toml_edit) |
pipeline.convert | { id, to: "json" | "toml" } | new path | Switch format; lossless except for TOML comments |
Diagnostics
Section titled “Diagnostics”| Tool | Input | Output | Use |
|---|---|---|---|
openapi.lint | { url | spec } | summary: OAS version, path count, response-body completeness, auth style, pagination idiom, gaps | Pre-flight: is this spec worth scaffolding from? |
connector.diagnose | { name, run_id } | structured diagnosis: failing assertion + last HTTP exchange + suggested fix | Why did this run fail? Far better signal than raw stderr. |
fixture.record | { connector, calls } | recorded cassette path | Capture live HTTP under capability grant for replay-based testing |
connector.build returns structured cargo JSON diagnostics, not stdout — frontier models parse JSON errors 2–3× more reliably than terminal text.
Execution
Section titled “Execution”| Tool | Input | Output | Use |
|---|---|---|---|
pipeline.dry_run | { id, sample_size?: number } | preview rows, no writes | Validate without committing |
pipeline.run | { id } | { run_id, status, row_count } | Execute now |
run.tail | { run_id, since?: string } | log/event stream | Watch a running pipeline |
run.trace | { run_id } | OTel trace JSON | Full distributed trace |
All write-side tools (*.install, *.run, *.edit) require either:
- a
--yesflag in the project’shakiri.toml [agent]section, or - a per-call confirmation token the human approves out-of-band
By default, MCP write tools return a diff and a confirmation prompt rather than acting. Agents propose; humans (or operators with --yes) commit.
The connector authoring loop
Section titled “The connector authoring loop”sequenceDiagram
autonumber
actor U as User
participant A as Agent
participant H as hakiri MCP
participant S as Sandbox
participant C as Catalog
U->>A: "add a Linear issues source"
A->>H: catalog.list_connectors {kind:"source"}
H-->>A: no "linear" connector
A->>H: connector.scaffold {name:"linear", kind:"source", spec_url:"https://developers.linear.app/openapi.json"}
H->>S: generate Rust crate + connector.toml from OpenAPI
S-->>H: path
H-->>A: connectors/linear/
A->>H: connector.build {name:"linear"}
H->>S: cargo build --target wasm32-wasi
S-->>H: build OK / warnings
A->>H: connector.test {name:"linear", fixtures:["issues-page-1.json"]}
H->>S: run WIT-conformance + replay
S-->>H: tests pass
A->>U: "Connector ready. Diff: …. Install?"
U->>A: yes
A->>H: connector.install {name:"linear", version:"0.1.0"}
H->>C: register
A->>H: pipeline.create {id:"linear-issues", source:{connector:"linear", ...}, tables:["issues"], schedule:"every 30m"}
A->>H: pipeline.dry_run {id:"linear-issues", sample_size:10}
H-->>A: 10 sample rows, schema check OK
A->>U: "Dry-run looks good. Run for real?"
Note the symmetry between human and agent: every step is also a CLI verb (hakiri connector scaffold, hakiri pipeline dry-run, …). The MCP surface is just a wire format for the same operations.
Provenance
Section titled “Provenance”Every agent-driven action records:
actor—agent:claude-opus-4-7/human:<id>/systemintent— the tool call’s inputeffect— diff of catalog/manifest/statetrace_id— OTel trace covering the whole call
Stored in meta.sqlite:
CREATE TABLE actor_event ( id INTEGER PRIMARY KEY AUTOINCREMENT, actor TEXT NOT NULL, tool TEXT NOT NULL, input_json TEXT NOT NULL, output_json TEXT, diff_json TEXT, trace_id TEXT, at TEXT NOT NULL);hakiri provenance --since 1h prints a human-readable timeline. This is the answer to “what did the agent change while I was at lunch?”
Sandboxing the scaffolded code
Section titled “Sandboxing the scaffolded code”The connector authoring loop runs in two safety nets:
- Build-time:
cargo build --target wasm32-wasiagainst a curated set of dependencies. Nobuild.rsarbitrary execution; we fail builds that try to escape into native code. - Test-time: contract tests run the produced
.wasmunderwasmtimewith the same capability grants the connector will receive in production. The host denies any capability the manifest didn’t declare.
A connector that an agent generated but that tries to read ~/.aws/credentials simply fails to load — both at test time and at runtime.
Eval harness
Section titled “Eval harness”For Hakiri-the-project: a small benchmark of connector authoring tasks (“write a Linear source from this OpenAPI”, “write a Stripe destination”, …) that we run against new model releases. Pass/fail metric: does the generated connector pass contract tests on the first try, second try with one round of agent feedback, never? This is how we measure whether the connector model is actually agent-friendly.
Open questions
Section titled “Open questions”- Multi-agent collaboration. Two agents editing the same manifest concurrently — same conflict story as humans, but worth scripted resolution patterns.
- Cost accounting. An MCP tool call that triggers a
pipeline.runcould move gigabytes of data. Worth a--budgetflag and acost_estimatetool that previews. - Tool-call truncation. Big schemas don’t fit in a 25K-token tool result. We’ll return summaries by default and require explicit pagination tools.