Connector Model
The connector model is the load-bearing decision of the project. It determines:
- whether agents can reliably author connectors,
- whether user-authored code is safe to run,
- whether connectors can ship out-of-band from the binary,
- whether the same connector code can run on a laptop and on Cloudflare Workers.
The contract: WIT + WASM Component Model
Section titled “The contract: WIT + WASM Component Model”All connectors — whether shipped in-tree or authored later — conform to a single WIT (WebAssembly Interface Type) world. Built-in connectors are still compiled as native Rust for speed, but they implement the same Rust trait that the WIT bindings generate, so the runtime treats them uniformly.
Rationale, alternatives considered, and trade-offs: see ADR-0001.
The WIT world (sketch)
Section titled “The WIT world (sketch)”package hakiri:connector@0.1.0;
interface types { record schema-field { name: string, data-type: data-type, nullable: bool, }
variant data-type { boolean, int32, int64, float64, string, bytes, timestamp-millis, list(data-type), struct(list<schema-field>), %json, // opaque JSON, deferred typing }
record schema { name: string, fields: list<schema-field>, primary-key: list<string>, }
// Arrow IPC-encoded RecordBatch (host and guest agree on Arrow schema separately) type record-batch = list<u8>;
type cursor = list<u8>; // opaque; connector owns the format
variant error { transient(string), permanent(string), auth-expired(string), rate-limited(u32), // retry-after seconds schema-incompatible(string), }}
interface source { use types.{schema, record-batch, cursor, error};
/// Return the schemas this source can produce. discover: func() -> result<list<schema>, error>;
/// Open a read stream against one table, optionally resuming. open: func(table: string, cursor: option<cursor>) -> result<read-handle, error>;
resource read-handle { /// Pull the next batch. Returns none when exhausted. next: func() -> result<option<record-batch>, error>; /// Current cursor (callable any time; advances as next() yields batches). cursor: func() -> cursor; }}
interface destination { use types.{schema, record-batch, error};
prepare: func(table: string, schema: schema) -> result<_, error>; write: func(table: string, batch: record-batch) -> result<_, error>; commit: func() -> result<_, error>; abort: func() -> result<_, error>;}
world connector { // Host capabilities granted to the component (each is opt-in per connector) import wasi:http/outgoing-handler@0.2.0; import wasi:logging/logging@0.1.0; import wasi:clocks/wall-clock@0.2.0; // No wasi:filesystem, no wasi:sockets — connectors talk to the world via host-mediated HTTP.
export source; export destination;}The connector world is split in practice into two narrower worlds — source and destination — so a connector that’s only a source doesn’t have to stub out destination methods. Both worlds share the types interface.
Capability grants
Section titled “Capability grants”A connector’s manifest declares what host capabilities it needs; the host refuses to load it without explicit approval.
[connector]name = "github"version = "0.3.1"kind = "source"wasm = "./github.wasm"
[capabilities]http = ["api.github.com"] # outbound HTTP allowlistenv = ["GITHUB_TOKEN"] # env vars the connector can readclock = trueWildcards (*.github.com) are supported. There is no escape hatch — a connector that needs unlisted access fails to load with a clear diagnostic.
The same TOML/JSON duality from pipeline manifests applies (see 03-pipelines.md). connector.toml is the canonical hand-edited form; connector.json (validated against hakiri schema export connector) is the agent-authoring path. Both deserialize into the same ConnectorManifest struct.
Capability grants are themselves declarative: the manifest describes what access the connector needs; the host decides whether to grant it. There is no runtime request_permission() call, no imperative escalation path. This keeps capability review tractable for both humans (read the TOML) and agents (validate against the JSON Schema before install).
Resource limits
Section titled “Resource limits”Capability allowlists prevent connectors from reaching the wrong things. Resource limits prevent them from consuming unbounded host resources — infinite loops, megabyte-per-second log floods, regex catastrophic backtracking, runaway allocations. wasmtime exposes the knobs (Store::limiter, fuel metering, async deadlines); the host configures defaults per call.
| Limit | Default | Override |
|---|---|---|
| Linear memory | 256 MiB | per-connector via manifest, max 2 GiB |
| Fuel (instructions) | 10⁹ units per next() / discover() call | per-call via runtime config |
| Wall-clock deadline | 30s per next(), 60s per discover() | per-pipeline via manifest |
| Log volume | 1 MiB/sec per connector via wasi:logging | hard cap; no override |
| Outbound HTTP | 100 in-flight requests per connector | per-connector via manifest |
A connector hitting a limit returns a transient error to the runtime, which may retry with backoff. Persistent limit hits surface as Permanent::ResourceExhausted and fail the run fast. Limit decisions are recorded in OTel spans so agents and operators see why a run failed without grepping logs.
The dep allowlist for in-tree connector authoring bans known footguns: fancy-regex (catastrophic backtracking) in favor of regex (linear-time); raw serde_json::Value for arbitrary input (depth-DoS) in favor of serde_json::de::from_slice with MaxDepth.
Authoring a connector
Section titled “Authoring a connector”Three personas, same WIT contract:
As an agent (primary path)
Section titled “As an agent (primary path)”hakiri agent scaffold-connector \ --spec ./openapi.json \ --kind source \ --name shopifyThis walks an OpenAPI/AsyncAPI/SQL spec and emits:
- A WIT-compliant Rust crate under
connectors/shopify/ - A
connector.tomlwith inferred capabilities - A round-trip test that exercises
discover()against a recorded fixture
The MCP server exposes this as a tool; an agent calls it, reviews the diff, runs hakiri connector build && hakiri connector test, and commits.
The host owns the WIT; the agent fills only the Rust impl. Scaffold emits WIT bindings from a fixed template (parameterized on spec + connector name); the agent’s surface area is the Rust trait methods, not the contract. This eliminates the most common failure mode in current frontier models — producing syntactically-plausible but semantically-wrong WIT (invented types, wrong package versions, mixed pre/post-0.2 import paths). WIT is immutable from the agent’s perspective; the host upgrades it across releases.
As a Rust author (power path)
Section titled “As a Rust author (power path)”Direct use of hakiri-connector-sdk:
use hakiri_connector_sdk::{source, Schema, RecordBatch, Cursor, Error};
#[source]struct GithubIssues { token: String }
impl Source for GithubIssues { fn discover(&self) -> Result<Vec<Schema>, Error> { ... } fn open(&self, table: &str, cursor: Option<Cursor>) -> Result<Stream, Error> { ... }}The macro emits wit-bindgen glue and a cdylib target.
As a non-Rust author (escape hatch)
Section titled “As a non-Rust author (escape hatch)”Use componentize-py, componentize-js, or TinyGo against the same WIT file. We ship example skeletons but won’t maintain non-Rust SDKs first-class.
Built-in connectors (M0/M1)
Section titled “Built-in connectors (M0/M1)”Compiled into the binary (no WASM round-trip) because they’re hot paths:
- Sources:
postgres(snapshot + logical decoding),http(REST, OpenAPI-driven),file(CSV/JSON/Parquet/NDJSON on local fs or S3),github(Issues/PRs/repos/comments),s3(object listing + content) - Destinations:
context(the local context store; this is the default),parquet(write Parquet to a path or S3),duckdb(write into an external DuckDB file),webhook(POST to a URL) - Transforms: kept minimal in M1 —
select,rename,cast,filter. Anything more interesting belongs in DuckDB SQL post-landing.
Built-in connectors implement the same Rust trait as the WIT bindings emit; the runtime is agnostic to which one it’s calling.
Target connector inventory (agent-authored, M2+)
Section titled “Target connector inventory (agent-authored, M2+)”Per Challenge 2 in PRD.md: the connector-count race against Airbyte (300+) and Fivetran (500+) is unwinnable on person-hours and largely misleading — most catalog entries are mediocre or unmaintained. Hakiri’s win condition is the 25–35 sources a team actually queries all being authorable, drift-detectable, and maintainable via the M2 agent loop, not a 300-tile marketplace.
This is the working target list. Each row is a candidate for the M2 agent-authoring eval — given the public OpenAPI or API documentation URL, can an agent produce a working WASM connector with passing dry-runs in one MCP conversation? Selection criteria: high adoption among the agent-builder + small-team audience (PRD § Target audience); public API docs; OAuth or token auth; no SOAP, no proprietary binary protocols requiring vendor SDKs.
Inventory
Section titled “Inventory”| Category | Source | Auth | OpenAPI | Pagination | Tier |
|---|---|---|---|---|---|
| Engineering / observability | GitHub | PAT / OAuth / App | ✓ | cursor + link-header | built-in (M1) |
| GitLab | PAT / OAuth | ✓ | page + cursor | M2 | |
| Jira | PAT / OAuth (Atlassian) | ✓ (REST v3) | offset + cursor | M2 | |
| Linear | PAT / OAuth | partial (GraphQL SDL) | cursor | M2 | |
| PagerDuty | API key | ✓ | offset | M2 | |
| Datadog | API key + app key | ✓ | cursor + time-window | M2 | |
| Sentry | PAT / DSN | ✓ | link-header | M2 | |
| Collaboration / docs | Slack | OAuth | partial (Web API method docs) | cursor | M2 |
| Notion | OAuth | ✓ | cursor | M2 | |
| Google Drive | OAuth | ✓ (via Discovery) | page-token | M2 | |
| Confluence | PAT / OAuth (Atlassian) | ✓ | offset + cursor | M2 | |
| Customer ops / support | Zendesk | OAuth / API token | ✓ | offset + cursor | M2 |
| Intercom | OAuth | ✓ | cursor | M2 | |
| Sales / CRM | Salesforce | OAuth (REST) | partial (REST only; SOQL custom) | offset + query locator | M2 |
| HubSpot | OAuth / Private app | ✓ | cursor | M2 | |
| Pipedrive | API token | ✓ | offset | M2 | |
| Product analytics | PostHog | API key | ✓ | offset + cursor | M2 |
| Mixpanel | service account | ✓ partial | time-window export | M2 | |
| Amplitude | API key + secret | ✓ partial | time-window export | M2 | |
| Segment | API token | ✓ partial | cursor | M2 | |
| Payments / billing | Stripe | API key | ✓ | cursor | M2 |
| Chargebee | API key | ✓ | offset + cursor | M2 | |
| QuickBooks | OAuth (Intuit) | partial | offset | M2.5 (per-realm OAuth complexity) | |
| Marketing | Mailchimp | OAuth / API key | ✓ | offset | M2 |
| Marketo | OAuth | ✓ | offset + paging token | M2.5 | |
| Recruiting / HR | Greenhouse | API key | ✓ | offset + link-header | M2 |
| BambooHR | API key | partial | offset | M2 | |
| Files / data | Airtable | PAT / OAuth | ✓ | cursor token | M2 |
| Shopify | OAuth | ✓ | cursor + link-header | M2 | |
| Google Sheets | OAuth | ✓ (via Discovery) | range-read | M2 | |
| Databases (as source) | Postgres | password / cert | n/a (wire protocol) | snapshot + logical decoding | built-in (M1) |
| MySQL | password / cert | n/a | snapshot + binlog | M2 | |
| MongoDB | connection string | n/a | snapshot + change stream | M2 | |
| Snowflake | OAuth / key-pair | n/a (SQL API + JDBC) | snapshot + stream | M2.5 | |
| BigQuery | service account | n/a (REST Storage API) | partition + stream | M2.5 | |
| Object storage (as source) | S3 | AWS creds | n/a | listing | built-in (M1) |
| GCS | service account | n/a (XML/JSON) | listing | M2 | |
| Azure Blob | shared key / SAS | n/a | listing | M2 |
Five built-in (M1), ~25 agent-authored REST + OpenAPI (M2), ~5 partial-OpenAPI or auth-complex (M2.5), ~5 binary-protocol databases / object stores (M2 or M2.5 depending on driver maturity).
Scaffolding patterns
Section titled “Scaffolding patterns”The inventory falls into four scaffolding shapes; the M2 agent scaffolder branches on shape, not on source:
- REST + OpenAPI (Stripe, GitHub, Notion, Datadog, PostHog, Jira, Zendesk, …) — primary path.
hakiri agent scaffold-connector --spec <openapi.json>produces ~90% of the connector. The agent fills the auth flow, pagination quirks, and thecursor-kinddeclaration. - REST without machine-readable OpenAPI (Slack Web API, parts of Mailchimp, BambooHR) — agent scaffolds from API documentation HTML / Markdown; lower automation, more iteration. M2.5 target with a lower per-call success rate in the eval harness.
- GraphQL (Linear primarily, Shopify alt-path) — scaffolded from a SDL schema dump. Same WIT contract; different fetch shape, different pagination idioms (relay-style cursors).
- Database / binary protocols (Postgres, MySQL, MongoDB) — hand-written Rust against established drivers (
tokio-postgres,mysql_async,mongodb). Not agent-scaffolded; ship as built-ins or as community Rust crates.
The M2 eval target — 60% reach discover() working, 30% reach full contract conformance (pm/roadmap.md M2) — is calibrated against the REST + OpenAPI subset. The other three shapes are tracked separately so a single weak shape doesn’t poison the headline metric.
Distribution of authoring effort
Section titled “Distribution of authoring effort”| Shape | Count | Authoring path | Maintenance |
|---|---|---|---|
| Built-in Rust (hot paths, native trait) | 5–7 | Hakiri team | Code review on every change |
| Agent-authored REST + OpenAPI | ~25 | Agent scaffolds; human reviews diff | hakiri connector check-drift on schedule |
| Agent-authored without OpenAPI | ~5 | Agent scaffolds from docs / samples | Same; lower automation |
| Community-contributed Rust / WASM | open-ended | Third-party authors against hakiri-connector-sdk | Maintainer’s responsibility; provenance recorded in catalog |
Hakiri does not aim to be a connector marketplace (Challenge 2 § The trap to avoid). The inventory above is the operational target — what should exist by the end of M2 for the agent-builder + small-team audience to be productive. Anything beyond is community-contributed under the same WIT contract, with provenance and capability declarations recorded in the catalog like any other connector.
Distribution
Section titled “Distribution”- In-tree — built into the binary; resolved by name.
- Local file —
wasm = "./path/to/connector.wasm"in the project. - HTTPS URL —
wasm = "https://hakiri.dev/connectors/shopify-0.3.1.wasm", with a SHA-256 pin. - OCI registry —
wasm = "oci://ghcr.io/owner/shopify:0.3.1". Components are valid OCI artifacts.
Pin-by-hash is enforced; an unpinned URL fails CI.
Testing connectors
Section titled “Testing connectors”- Contract tests ship with the SDK: feed every connector a synthetic WIT-conformance test (discover returns valid schemas, open returns a finite stream, cursor round-trips).
- Fixture replay: record real HTTP interactions to
.hakiri/cassettes/(VCR-style), replay them in CI. - Property tests via
proptestfor cursor monotonicity and schema invariants.
Open questions
Section titled “Open questions”- WASI 0.2 vs 0.3. As of writing,
wasmtimeships solid 0.2; 0.3 (with native async) is landing. Default to 0.2 with a migration plan. - Component pooling. Cold-start is ~ms; for fast-firing pipelines we may want to pool component instances. Defer until measurement says it matters.
- Connector signing. Sigstore-style transparency log for community-published connectors. Worth considering for the marketplace narrative.