Agent Interface

The agent interface is the differentiator: an LLM agent should be able to (1) discover the available capabilities, (2) author a new connector against an unfamiliar API, (3) safely test and install it, and (4) observe its runs — all through a stable contract.

Surface: MCP server, ships in the binary

hakiri mcp starts a Model Context Protocol server over stdio (for Claude Code et al.) or HTTP+SSE (for hosted clients). All capabilities listed below are exposed as MCP tools.

# stdio (Claude Code / desktop clients)
hakiri mcp

# HTTP (hosted/remote agents)
hakiri mcp --http 0.0.0.0:7700 --token $TOKEN

Tools

Discovery

Tool	Input	Output	Use
`catalog.list_connectors`	`{ kind?: "source" \| "destination" }`	list of `{ name, version, kind, summary }`	What can I use?
`catalog.describe_connector`	`{ name }`	full WIT-derived metadata + config schema	What does this take?
`pipeline.list`	—	list of `{ id, source, tables, schedule, last_run }`	What’s already wired?
`pipeline.describe`	`{ id }`	full manifest + last run trace	Tell me about this pipeline

Schema introspection

Tool	Input	Output	Use
`source.list_tables`	`{ connector, config }`	list of `{ name, summary, row_count_estimate? }`	What tables does this source expose? Concise — fits in tokens.
`source.sample_table`	`{ connector, config, table, n?: 10 }`	`{ schema, rows: [...] }`	What does this table look like? Schema + n example rows.
`source.discover`	`{ connector, config, table? }`	full table schemas (paginated if large)	Deep discovery — call after `list_tables` narrows the surface.
`context.tables`	—	list of tables with row counts and last_updated	What’s in my context store?
`context.schema`	`{ table }`	Arrow schema JSON + evolution history	What columns, and how have they changed?

Query

Tool	Input	Output	Use
`context.query`	`{ sql, limit?: number }`	result rows (capped)	Read from the context store
`context.explain`	`{ sql }`	DuckDB EXPLAIN	Is this query reasonable?

Authoring

Tool	Input	Output	Use
`connector.scaffold`	`{ name, kind, spec_url? \| openapi? \| sql? }`	path to generated crate	Generate a connector skeleton
`connector.build`	`{ name }`	build status + warnings	Compile WIT → WASM
`connector.test`	`{ name, fixtures?: string[] }`	test run summary	Verify against contract
`connector.install`	`{ name, version }`	install status	Register into the project catalog
`pipeline.create`	`{ id, source, tables, schedule?, format?: "json" \| "toml" }`	manifest path	Writes `pipelines/<id>.json` by default; `format: "toml"` appends a `[[pipeline]]` block to `hakiri.toml`
`pipeline.edit`	`{ id, patch }`	new manifest content	Modify an existing pipeline; preserves the file’s format (TOML round-trips via `toml_edit`)
`pipeline.convert`	`{ id, to: "json" \| "toml" }`	new path	Switch format; lossless except for TOML comments

Diagnostics

Tool	Input	Output	Use
`openapi.lint`	`{ url \| spec }`	summary: OAS version, path count, response-body completeness, auth style, pagination idiom, gaps	Pre-flight: is this spec worth scaffolding from?
`connector.diagnose`	`{ name, run_id }`	structured diagnosis: failing assertion + last HTTP exchange + suggested fix	Why did this run fail? Far better signal than raw stderr.
`fixture.record`	`{ connector, calls }`	recorded cassette path	Capture live HTTP under capability grant for replay-based testing

connector.build returns structured cargo JSON diagnostics, not stdout — frontier models parse JSON errors 2–3× more reliably than terminal text.

Execution

Tool	Input	Output	Use
`pipeline.dry_run`	`{ id, sample_size?: number }`	preview rows, no writes	Validate without committing
`pipeline.run`	`{ id }`	`{ run_id, status, row_count }`	Execute now
`run.tail`	`{ run_id, since?: string }`	log/event stream	Watch a running pipeline
`run.trace`	`{ run_id }`	OTel trace JSON	Full distributed trace

All write-side tools (*.install, *.run, *.edit) require either:

a --yes flag in the project’s hakiri.toml [agent] section, or
a per-call confirmation token the human approves out-of-band

By default, MCP write tools return a diff and a confirmation prompt rather than acting. Agents propose; humans (or operators with --yes) commit.

The connector authoring loop

sequenceDiagram
  autonumber
  actor U as User
  participant A as Agent
  participant H as hakiri MCP
  participant S as Sandbox
  participant C as Catalog

  U->>A: "add a Linear issues source"
  A->>H: catalog.list_connectors {kind:"source"}
  H-->>A: no "linear" connector
  A->>H: connector.scaffold {name:"linear", kind:"source", spec_url:"https://developers.linear.app/openapi.json"}
  H->>S: generate Rust crate + connector.toml from OpenAPI
  S-->>H: path
  H-->>A: connectors/linear/
  A->>H: connector.build {name:"linear"}
  H->>S: cargo build --target wasm32-wasi
  S-->>H: build OK / warnings
  A->>H: connector.test {name:"linear", fixtures:["issues-page-1.json"]}
  H->>S: run WIT-conformance + replay
  S-->>H: tests pass
  A->>U: "Connector ready. Diff: …. Install?"
  U->>A: yes
  A->>H: connector.install {name:"linear", version:"0.1.0"}
  H->>C: register
  A->>H: pipeline.create {id:"linear-issues", source:{connector:"linear", ...}, tables:["issues"], schedule:"every 30m"}
  A->>H: pipeline.dry_run {id:"linear-issues", sample_size:10}
  H-->>A: 10 sample rows, schema check OK
  A->>U: "Dry-run looks good. Run for real?"

Note the symmetry between human and agent: every step is also a CLI verb (hakiri connector scaffold, hakiri pipeline dry-run, …). The MCP surface is just a wire format for the same operations.

Provenance

Every agent-driven action records:

actor — agent:claude-opus-4-7 / human:<id> / system
intent — the tool call’s input
effect — diff of catalog/manifest/state
trace_id — OTel trace covering the whole call

Stored in meta.sqlite:

CREATE TABLE actor_event (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  actor TEXT NOT NULL,
  tool TEXT NOT NULL,
  input_json TEXT NOT NULL,
  output_json TEXT,
  diff_json TEXT,
  trace_id TEXT,
  at TEXT NOT NULL
);

hakiri provenance --since 1h prints a human-readable timeline. This is the answer to “what did the agent change while I was at lunch?”

Sandboxing the scaffolded code

The connector authoring loop runs in two safety nets:

Build-time: cargo build --target wasm32-wasi against a curated set of dependencies. No build.rs arbitrary execution; we fail builds that try to escape into native code.
Test-time: contract tests run the produced .wasm under wasmtime with the same capability grants the connector will receive in production. The host denies any capability the manifest didn’t declare.

A connector that an agent generated but that tries to read ~/.aws/credentials simply fails to load — both at test time and at runtime.

Eval harness

For Hakiri-the-project: a small benchmark of connector authoring tasks (“write a Linear source from this OpenAPI”, “write a Stripe destination”, …) that we run against new model releases. Pass/fail metric: does the generated connector pass contract tests on the first try, second try with one round of agent feedback, never? This is how we measure whether the connector model is actually agent-friendly.

Open questions

Multi-agent collaboration. Two agents editing the same manifest concurrently — same conflict story as humans, but worth scripted resolution patterns.
Cost accounting. An MCP tool call that triggers a pipeline.run could move gigabytes of data. Worth a --budget flag and a cost_estimate tool that previews.
Tool-call truncation. Big schemas don’t fit in a 25K-token tool result. We’ll return summaries by default and require explicit pagination tools.