Skip to content

Agent Interface

The agent interface is the differentiator: an LLM agent should be able to (1) discover the available capabilities, (2) author a new connector against an unfamiliar API, (3) safely test and install it, and (4) observe its runs — all through a stable contract.

hakiri mcp starts a Model Context Protocol server over stdio (for Claude Code et al.) or HTTP+SSE (for hosted clients). All capabilities listed below are exposed as MCP tools.

Terminal window
# stdio (Claude Code / desktop clients)
hakiri mcp
# HTTP (hosted/remote agents)
hakiri mcp --http 0.0.0.0:7700 --token $TOKEN
ToolInputOutputUse
catalog.list_connectors{ kind?: "source" | "destination" }list of { name, version, kind, summary }What can I use?
catalog.describe_connector{ name }full WIT-derived metadata + config schemaWhat does this take?
pipeline.listlist of { id, source, tables, schedule, last_run }What’s already wired?
pipeline.describe{ id }full manifest + last run traceTell me about this pipeline
ToolInputOutputUse
source.list_tables{ connector, config }list of { name, summary, row_count_estimate? }What tables does this source expose? Concise — fits in tokens.
source.sample_table{ connector, config, table, n?: 10 }{ schema, rows: [...] }What does this table look like? Schema + n example rows.
source.discover{ connector, config, table? }full table schemas (paginated if large)Deep discovery — call after list_tables narrows the surface.
context.tableslist of tables with row counts and last_updatedWhat’s in my context store?
context.schema{ table }Arrow schema JSON + evolution historyWhat columns, and how have they changed?
ToolInputOutputUse
context.query{ sql, limit?: number }result rows (capped)Read from the context store
context.explain{ sql }DuckDB EXPLAINIs this query reasonable?
ToolInputOutputUse
connector.scaffold{ name, kind, spec_url? | openapi? | sql? }path to generated crateGenerate a connector skeleton
connector.build{ name }build status + warningsCompile WIT → WASM
connector.test{ name, fixtures?: string[] }test run summaryVerify against contract
connector.install{ name, version }install statusRegister into the project catalog
pipeline.create{ id, source, tables, schedule?, format?: "json" | "toml" }manifest pathWrites pipelines/<id>.json by default; format: "toml" appends a [[pipeline]] block to hakiri.toml
pipeline.edit{ id, patch }new manifest contentModify an existing pipeline; preserves the file’s format (TOML round-trips via toml_edit)
pipeline.convert{ id, to: "json" | "toml" }new pathSwitch format; lossless except for TOML comments
ToolInputOutputUse
openapi.lint{ url | spec }summary: OAS version, path count, response-body completeness, auth style, pagination idiom, gapsPre-flight: is this spec worth scaffolding from?
connector.diagnose{ name, run_id }structured diagnosis: failing assertion + last HTTP exchange + suggested fixWhy did this run fail? Far better signal than raw stderr.
fixture.record{ connector, calls }recorded cassette pathCapture live HTTP under capability grant for replay-based testing

connector.build returns structured cargo JSON diagnostics, not stdout — frontier models parse JSON errors 2–3× more reliably than terminal text.

ToolInputOutputUse
pipeline.dry_run{ id, sample_size?: number }preview rows, no writesValidate without committing
pipeline.run{ id }{ run_id, status, row_count }Execute now
run.tail{ run_id, since?: string }log/event streamWatch a running pipeline
run.trace{ run_id }OTel trace JSONFull distributed trace

All write-side tools (*.install, *.run, *.edit) require either:

  • a --yes flag in the project’s hakiri.toml [agent] section, or
  • a per-call confirmation token the human approves out-of-band

By default, MCP write tools return a diff and a confirmation prompt rather than acting. Agents propose; humans (or operators with --yes) commit.

sequenceDiagram
  autonumber
  actor U as User
  participant A as Agent
  participant H as hakiri MCP
  participant S as Sandbox
  participant C as Catalog

  U->>A: "add a Linear issues source"
  A->>H: catalog.list_connectors {kind:"source"}
  H-->>A: no "linear" connector
  A->>H: connector.scaffold {name:"linear", kind:"source", spec_url:"https://developers.linear.app/openapi.json"}
  H->>S: generate Rust crate + connector.toml from OpenAPI
  S-->>H: path
  H-->>A: connectors/linear/
  A->>H: connector.build {name:"linear"}
  H->>S: cargo build --target wasm32-wasi
  S-->>H: build OK / warnings
  A->>H: connector.test {name:"linear", fixtures:["issues-page-1.json"]}
  H->>S: run WIT-conformance + replay
  S-->>H: tests pass
  A->>U: "Connector ready. Diff: …. Install?"
  U->>A: yes
  A->>H: connector.install {name:"linear", version:"0.1.0"}
  H->>C: register
  A->>H: pipeline.create {id:"linear-issues", source:{connector:"linear", ...}, tables:["issues"], schedule:"every 30m"}
  A->>H: pipeline.dry_run {id:"linear-issues", sample_size:10}
  H-->>A: 10 sample rows, schema check OK
  A->>U: "Dry-run looks good. Run for real?"

Note the symmetry between human and agent: every step is also a CLI verb (hakiri connector scaffold, hakiri pipeline dry-run, …). The MCP surface is just a wire format for the same operations.

Every agent-driven action records:

  • actoragent:claude-opus-4-7 / human:<id> / system
  • intent — the tool call’s input
  • effect — diff of catalog/manifest/state
  • trace_id — OTel trace covering the whole call

Stored in meta.sqlite:

CREATE TABLE actor_event (
id INTEGER PRIMARY KEY AUTOINCREMENT,
actor TEXT NOT NULL,
tool TEXT NOT NULL,
input_json TEXT NOT NULL,
output_json TEXT,
diff_json TEXT,
trace_id TEXT,
at TEXT NOT NULL
);

hakiri provenance --since 1h prints a human-readable timeline. This is the answer to “what did the agent change while I was at lunch?”

The connector authoring loop runs in two safety nets:

  1. Build-time: cargo build --target wasm32-wasi against a curated set of dependencies. No build.rs arbitrary execution; we fail builds that try to escape into native code.
  2. Test-time: contract tests run the produced .wasm under wasmtime with the same capability grants the connector will receive in production. The host denies any capability the manifest didn’t declare.

A connector that an agent generated but that tries to read ~/.aws/credentials simply fails to load — both at test time and at runtime.

For Hakiri-the-project: a small benchmark of connector authoring tasks (“write a Linear source from this OpenAPI”, “write a Stripe destination”, …) that we run against new model releases. Pass/fail metric: does the generated connector pass contract tests on the first try, second try with one round of agent feedback, never? This is how we measure whether the connector model is actually agent-friendly.

  • Multi-agent collaboration. Two agents editing the same manifest concurrently — same conflict story as humans, but worth scripted resolution patterns.
  • Cost accounting. An MCP tool call that triggers a pipeline.run could move gigabytes of data. Worth a --budget flag and a cost_estimate tool that previews.
  • Tool-call truncation. Big schemas don’t fit in a 25K-token tool result. We’ll return summaries by default and require explicit pagination tools.