Skip to content

Runtime

pluck.runtime({...}) returns an AgentRuntime that orchestrates N heterogeneous agents. Each agent drives its own LLM provider, consumes its own subset of Pluck verbs as tools, runs under its own budget cap, and emits signed receipts for every turn + tool call into the v0.11 Substrate.


Why a runtime that's NOT just one provider per fleet

Pluck's runtime is heterogeneous on purpose. MCP sampling is request-response-serial, so "1000 agents through one host LLM" is a serialised bottleneck dressed as parallelism. Instead, each agent supplies its own provider + credentials and runs its own loop. Need a Sonnet 4.6 routing agent handing off to ten Haiku 4.5 specialists? Each declares its own LlmProviderAdapter – the runtime sequences them through a handoff graph.

The runtime is intentionally minimal. The canonical multi-agent orchestrator lives in @directive-run/ai; Pluck's contribution layered on top is:

  1. Pluck verb surface as auto-registered toolsconnect, extract, shape, act, sense, probe, context, dowse available to every agent via its tools manifest, with JSON schemas and per-field validation.
  2. Substrate-backed signed trace – every turn + every tool call lands in the event log; receipts chain via parentSig so the run is a verifiable record.
  3. Fleet integration – fleet members can each own a runtime instance, scaling agent execution to N members × M targets.

The factory call

TypeScript
import {
  createPluck,
  openaiProvider,
  anthropicProvider,
} from "@sizls/pluck";

const pluck = createPluck();

const runtime = pluck.runtime({
  agents: [
    {
      id: "researcher",
      systemPrompt: "Read the URL and summarise the key facts.",
      provider: openaiProvider({ model: "gpt-5", apiKey: process.env.OPENAI_API_KEY! }),
      tools: ["extract", "context", "sense"],   // read-only manifest
      budget: { maxTurns: 10, maxToolCalls: 30, maxTokens: 50_000 },
    },
    {
      id: "actor",
      systemPrompt: "Take the researcher's notes and post a summary.",
      provider: anthropicProvider({ model: "claude-sonnet-4-6-20260201", apiKey: process.env.ANTHROPIC_API_KEY! }),
      tools: ["act"],                            // mutating – opt-in
      budget: { maxTurns: 5, maxToolCalls: 5, maxCostUsd: 1.0 },
    },
  ],
  signingKey: process.env.PLUCK_SIGNING_KEY!,
});

const result = await runtime.run({
  goal: "Read https://news.example.com/post and post a summary to slack.",
  plan: {
    entry: "researcher",
    edges: [{ from: "researcher", to: "actor" }],
    exits: ["actor"],
  },
});

Providers

Five provider adapters ship with the runtime. The first two are deterministic and meant for tests; the rest speak vendor wire protocols natively, including tool calls.

ProviderUse forCost reporting
echoProvider()tests, smoke runs – replies "received: <last user prompt>"none
scriptedProvider(steps)unit tests – caller supplies a step-by-step response scriptnone
openaiProvider({ apiKey, model })OpenAI / Azure / Together / Fireworks / OpenRouter (anything OpenAI-compatible)costUsd from OPENAI_PRICING table
anthropicProvider({ apiKey, model })Anthropic Messages APIcostUsd from ANTHROPIC_PRICING table
ollamaProvider({ model, baseURL })local Ollama (http://localhost:11434 default)none – local models are free

All three real adapters:

  • require an explicit model and a non-empty apiKey at construction – both throw at factory time so misconfiguration surfaces before the first turn rather than as a 401 on first call. (Ollama is model-only – local models are free.)
  • accept a fetch override for tests + custom transport (Azure header injection, Bedrock signing, etc.)
  • reject baseURL that isn't https:// (HTTP only permitted on localhost / 127.0.0.1 / [::1]) so a misconfigured base URL can't accidentally ship an API key in the clear
  • respect request.signal – when the runtime is aborted, the in-flight LLM call is torn down
  • accept a per-request timeoutMs that aborts the fetch independently of the run-level signal
  • map vendor-specific tool-call shapes to Pluck's LlmToolCall[] so the runtime loop is identical regardless of which provider drove the turn
  • throw a typed LlmProviderHttpError on non-2xx responses, with status, bodySnippet (≤ 500 chars, capped read at 8 KiB), providerName, a status-aware hint (e.g. "auth rejected – verify apiKey is set + valid for this baseURL", "rate-limited – back off and retry"), and retryAfterMs parsed from Retry-After for 429 responses
TypeScript
import {
  openaiProvider,
  anthropicProvider,
  ollamaProvider,
} from "@sizls/pluck";

// OpenAI
const gpt = openaiProvider({
  apiKey: process.env.OPENAI_API_KEY!,
  model: "gpt-4o",
  timeoutMs: 30_000,
});

// Anthropic
const claude = anthropicProvider({
  apiKey: process.env.ANTHROPIC_API_KEY!,
  model: "claude-sonnet-4-6-20260201",
  maxTokens: 4096,
});

// Self-hosted Ollama (note: localhost http is permitted; remote http is not)
const llama = ollamaProvider({
  model: "llama3.1",
  // baseURL: "http://localhost:11434",  // default
});

// OpenAI-compatible gateway (e.g. OpenRouter, Together, Fireworks)
const router = openaiProvider({
  apiKey: process.env.OPENROUTER_API_KEY!,
  baseURL: "https://openrouter.ai/api/v1",
  model: "anthropic/claude-sonnet-4-6",
});

OPENAI_PRICING and ANTHROPIC_PRICING are exported maps of model → { input, output } USD-per-million-tokens rates. Lookup is exact-match first, then longest-prefix – date-suffix variants like gpt-4o-2024-11-20 resolve to the gpt-4o rate without an explicit entry. Pass a custom table via pricing: { ... } when you're hitting a model neither covers; when no rate matches a one-shot dev warning fires (per provider+model), costUsd is reported undefined, and maxCostUsd budget caps will not trigger for that agent.

API-key destination warning. The configured apiKey is sent verbatim as Authorization: Bearer … (OpenAI / Ollama-compat) or x-api-key: … (Anthropic) to whatever host baseURL resolves to. Mistyping a domain ships your key to it. Only point at trusted vendor / gateway hosts (the https:// check rules out plaintext, not "wrong destination").

Need Gemini, Bedrock, Vertex, or another vendor? LlmProviderAdapter is a one-method interface – turn(request): Promise<LlmTurnResponse>. Wrapping any chat-completions API in ~80 LOC is the supported extension path.


The default tool surface is read-only

When you omit tools on an AgentDefinition, the agent gets the safe-default subset: connect, extract, shape, sense, probe, context, dowse – read everything, mutate nothing. Mutating agents must opt into "act" explicitly. Same discipline as the v0.4 browser-agent actor's response-policy gate: trust to read != trust to write.

Unknown tool names fail at runtime() construction so an agent never gets handed a dangling reference. An LLM that emits a tool call NOT in its agent's manifest gets back a generic "tool unavailable" message – the runtime deliberately doesn't echo the requested name back so a compromised provider can't enumerate the host's manifest by probing.


Custom tools

Agents can register caller-supplied tools alongside Pluck verbs via customTools:

TypeScript
import type { CustomTool, CustomToolContext } from "@sizls/pluck";

const slackPost: CustomTool = {
  name: "slack_post",
  description: "Post a message to a Slack channel.",
  schema: {
    type: "object",
    required: ["channel", "text"],
    properties: {
      channel: { type: "string" },
      text: { type: "string" },
    },
  },
  async invoke(args, ctx: CustomToolContext) {
    // ctx.signal – torn down on runtime.destroy() or run({signal}) abort
    // ctx.pluck  – the bound PluckInstance, so a custom tool can chain
    //              into Pluck verbs (e.g. ctx.pluck.act(...))
    // ctx.agentId – the agent that emitted the call
    const res = await fetch("https://slack.com/api/chat.postMessage", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${process.env.SLACK_BOT_TOKEN}`,
      },
      body: JSON.stringify(args),
      ...(ctx.signal ? { signal: ctx.signal } : {}),
    });

    return res.json();
  },
};

const runtime = pluck.runtime({
  agents: [
    {
      id: "poster",
      systemPrompt: "Read the URL and post a summary to #general.",
      provider: anthropicProvider({ apiKey, model: "claude-sonnet-4-6-20260201" }),
      tools: ["extract"],          // Pluck verbs in scope
      customTools: [slackPost],    // caller tools layered on top
    },
  ],
});

Same hardening applies as for Pluck verbs:

  • Names: must match /^[a-zA-Z_][a-zA-Z0-9_-]*$/ and may not collide with a Pluck verb (connect, extract, shape, act, sense, probe, context, dowse). Bad manifests fail at runtime() construction. Pick a domain prefix like slack_* / db_* / _my_* to keep room for new Pluck verbs in future minor releases – a customTools: [{name: "deduce", …}] would silently pass on Pluck v0.14 and reject on a future Pluck that adds a deduce verb.
  • Prototype-pollution scrub: __proto__ / constructor / prototype keys are stripped from plain-object args before invoke sees them. Args nested deeper than 64 levels reject as a generic "tool unavailable" so a hostile model can't stack-overflow the host process.
  • Abort signal: ctx.signal is the run-level signal; long-running custom tools should pass it into their own fetch / spawn calls so abort tears them down mid-flight.
  • Error mapping: throwing ToolArgError from invoke surfaces a generic "tool unavailable" to the LLM (matches the discipline applied to built-in verbs); throwing any other Error forwards the error message verbatim to the LLM so domain failures stay debuggable in the agent's conversation.
  • Return values: must be JSON-serialisable for the LLM tool message. BigInt is coerced to a string and circular references render as "[Circular]"; other non-JSON values (functions, symbols) are dropped. Return null / undefined when there's nothing to say.
  • Tool surface: Pluck verbs and custom tool names share the same flat namespace the model sees. Two custom tools with the same name reject at construction time.

Trust model. A custom tool's invoke is trusted code, not LLM-supplied. The LLM picks tool names + arguments; your code decides what those mean. Treat the LLM as if it could emit anything.

  • ctx.pluck is unscoped. A custom tool receives the full PluckInstance and can call ctx.pluck.act(...) even when the agent's tools allowlist excludes "act". The tools: [...] allowlist gates which Pluck verbs the LLM can ask for directly – it does not gate what your custom-tool code does behind the scenes. If a custom tool proxies an LLM-supplied URI into ctx.pluck.act(), validate the URI yourself first.
  • Error messages reach the LLM verbatim (when not wrapped in ToolArgError). Don't throw new Error(\DB rejected password '${secret}'`)– the model's host (which may be a third-party provider) will see the secret. Wrap secrets inToolArgErrorto mask the detail behind the generic"tool unavailable"` message; the substrate event still records the full diagnostic for operators.
  • description and schema are sent to the model verbatim as part of the tools-list system context. Treat them like prompt fragments – never derive them from untrusted input or a third party's npm package without reading what they put there.

Pass tools: [] + a populated customTools to run an agent that has only caller tools – useful when the agent's job is unrelated to the Pluck verb surface.


Argument validation

LLM-supplied tool args run through hardened validation before any Pluck verb sees them:

  • Prototype-pollution scrub__proto__ / constructor / prototype keys stripped recursively from plain-object args. Non-plain objects (Date, Map, Set, Buffer, class instances) pass through untouched so a Buffer legitimately routed through act.input doesn't get mangled into {}.
  • Strict type checksshape.template and act.action require literal strings; an LLM emitting { action: { toString: () => "delete" } } fails fast with a typed ToolArgError.
  • Generic LLM error – when validation fails, the agent's tool-result message says only "tool unavailable". The substrate event keeps the full detail (which tool, which field) for operators inspecting the trace.

Budget caps

Every agent has a budget tracker. Default limits:

CapDefaultPurpose
maxTurns50Stop runaway loops.
maxToolCalls200Stop tool-call thrashing.
maxTokensunlimitedToken-budget gate when set.
maxCostUsdunlimitedHard $ ceiling per agent per run.

Exhaustion exits the agent loop with a typed budgetExhausted: "turns" | "toolCalls" | "tokens" | "costUsd" reason. The trace records agent.budget.exhausted. The handoff graph sees the agent as having "completed" so flow continues normally to the next handoff (or terminates) – no special crash path.


Handoff graph

TypeScript
const plan = {
  entry: "router",
  edges: [
    { from: "router", to: "specialist", when: (a) => a.includes("specialist") },
    { from: "router", to: "fallback" },
    { from: "specialist", to: "actor" },
    { from: "fallback", to: "actor" },
  ],
  exits: ["actor"],
};

Each edge has an optional when(finalAnswer) predicate – the first matching edge fires. Cycles are detected per-agent: an agent can be re-entered up to maxAgentVisits times (default 8, configurable). Diamond patterns (A→B→D + A→C→D) work without false-positive cycles because cycle detection counts visits, not bare entry. Genuine cycles (A→B→A→B…) hit the cap and emit agent.handoff.cycle.

Skipping the plan entirely runs the first registered agent in single-agent mode – no graph required for the common one-agent case.


Trace events

Every event flows through the Substrate's event log:

EventWhenCapability
agent.llm.turnEvery provider turnagent:llm
agent.llm.errorProvider throwsagent:llm
agent.tool.invokeTool call succeedsagent:tool
agent.tool.failedTool call throws (incl. validation)agent:tool
agent.tool.rejectedTool name not in manifestagent:tool
agent.handoff.transitionEdge firesagent:control
agent.handoff.cycleVisit cap hitagent:control
agent.budget.exhaustedAny cap hitagent:policy
agent.provider.destroy.failedProvider destroy throwsagent

Every event carries an HLC timestamp. When a signingKey is configured, every turn + tool call is also written to a chained Ed25519 receipt – the chain is serialised behind a chainTail mutex (same pattern as v0.11 fleet) so future parallel-handoff workloads can't fork the chain.


Memory

The default in-memory adapter (createInMemoryRuntimeMemory()) is Map-backed with three scope conventions:

  • "agent:{id}" – per-agent scratch.
  • "shared" – readable + writable across agents within one run. The runtime auto-writes each agent's finalAnswer here under agent:{id}:answer.
  • "global" – persisted across runs (requires a durable adapter).

Swap in a SQLite/Redis/etc. adapter when state needs to outlive a process.


Lifecycle

TypeScript
try {
  const runtime = pluck.runtime({ ... });
  const result = await runtime.run({ goal });
} finally {
  await runtime.destroy();   // calls each provider's destroy(), drains the chain
}

Provider destroy errors emit agent.provider.destroy.failed events instead of disappearing silently – operators get a diagnostic trace for misbehaving real-world adapters (token-pool clients, model loaders).

runtime.run({ signal }) propagates the signal into every Pluck verb invocation through the tools layer. A pre-flight signal.aborted check throws before starting a doomed verb call so long-running pluck.act / pluck.sense operations don't waste budget after a cancel.


Cassette replay (regression-testing LLM behaviour)

Pin a known-good run, swap providers, verify the new provider produces the same tool-call sequence. Useful when upgrading models (gpt-4ogpt-5, claude-sonnet-4-5claude-sonnet-4-6) or comparing alternate providers (Anthropic vs OpenAI on the same goal).

TypeScript
import {
  cassetteProvider,
  diffCassettes,
  recordingProvider,
  openaiProvider,
} from "@sizls/pluck";

// 1. Record – wrap the real provider in `recordingProvider`.
const recorder = recordingProvider(openaiProvider({
  apiKey: process.env.OPENAI_API_KEY!,
  model: "gpt-4o",
}));

const runtime1 = pluck.runtime({
  agents: [{ id: "a", systemPrompt: "...", provider: recorder, tools: ["extract"] }],
});
await runtime1.run({ goal: "summarise the homepage" });
const cassette = recorder.toCassette({ agentId: "a" });
fs.writeFileSync("./cassettes/homepage.json", JSON.stringify(cassette, null, 2));

// 2. Replay – feed the cassette back in place of the real provider.
const replay = cassetteProvider(cassette);
const runtime2 = pluck.runtime({
  agents: [{ id: "a", systemPrompt: "...", provider: replay, tools: ["extract"] }],
});
await runtime2.run({ goal: "summarise the homepage" });
//    ↑ identical tool-call sequence, no API spend, deterministic.

// 3. Drift detection – record a NEW run against gpt-5, diff vs the gpt-4o cassette.
const newRecorder = recordingProvider(openaiProvider({
  apiKey: process.env.OPENAI_API_KEY!,
  model: "gpt-5",
}));
const runtime3 = pluck.runtime({
  agents: [{ id: "a", systemPrompt: "...", provider: newRecorder, tools: ["extract"] }],
});
await runtime3.run({ goal: "summarise the homepage" });
const newCassette = newRecorder.toCassette({ agentId: "a" });

const drift = diffCassettes(cassette, newCassette, { ignoreContent: true });
if (drift.length > 0) {
  throw new Error(`Tool-call drift between gpt-4o and gpt-5: ${JSON.stringify(drift)}`);
}

Cassette format: JSON-serialisable list of {turnIndex, promptHash, responseHash?, response} entries inside an envelope of {version, agentId?, recordedAt, recordedProvider, cassetteId?, envelopeHash?}. promptHash is a SHA-256 of the canonicalised conversation messages – cassetteProvider enforces it strictly by default so a drift in upstream tool dispatch (e.g. extract returning a different result) raises a typed error mid-replay rather than silently producing the cached response. Set strict: false for "best-effort" replay during exploration. turnIndex cross-checks against the replay cursor so a hand-edited or out-of-order cassette is detected at load time, not at end-of-run.

Cassette integrity (v0.35-R2+)

Cassettes flow through PR review, shared S3 buckets, npm packages – anywhere a JSON file can be hand-edited between record and replay. From v0.35-R2 onward, recordings emit version: 2 cassettes with three layered integrity controls:

  • responseHash per entry – SHA-256 over canonical {cassetteId, turnIndex, promptHash, response}. Detects per-entry edits, cross-cassette splice (binding to cassetteId), and same-turn poisoning (binding to turnIndex + promptHash). Replay recomputes and refuses on mismatch.
  • cassetteId – random 16-byte (32-hex) identity generated by recordingProvider at first turn. Bound into every responseHash, so an entry from cassette B cannot be spliced into cassette A.
  • envelopeHash – SHA-256 over canonical {version, agentId, recordedAt, recordedProvider, cassetteId, entryDigests}. Detects metadata tampering (recordedProvider swap, agentId rebind, recordedAt forgery) that per-entry hashes do not cover.
TypeScript
const replay = cassetteProvider(cassette);  // v2 always validates integrity
const replay = cassetteProvider(cassette, { requireIntegrity: true });  // also refuses v1

requireIntegrity: true refuses v1 cassettes wholesale – recommended for CI / production pipelines where a v1 fixture without integrity guarantees should not flow through. Default false for backward compatibility with v0.34 fixtures; v1 cassettes still load with a one-time console.warn reminding operators to re-record.

Downgrade-attack defence. A real v1 cassette never carries cassetteId, envelopeHash, or per-entry responseHash – those fields ship only on v2. v0.35-R3 added a downgrade detector: a v1-tagged cassette carrying any v2-only field is refused with "downgrade tampering suspected". This closes the attack where someone strips version: 2 → 1 and forges per-entry hashes via a v1-compat fallback.

Re-record to upgrade. v0.34/R1 cassettes (version 1, with or without responseHash) replay correctly under v0.35-R2+ EXCEPT when they carry an old-format responseHash (which the downgrade detector refuses). To upgrade: rerun the recording against the live provider – recordingProvider always emits v2 from v0.35-R2 onward.

Integrity vs. authenticity. envelopeHash is SHA-256, not Ed25519-signed – it detects post-record edits but not authorship. For non-repudiation (a regulator asking "prove this cassette came from this CI pipeline at this time"), pair the cassette load with a pluck.act receipt of the recording event. A v0.36 follow-up will add recordingProvider({ sign: "env:PLUCK_SIGNING_KEY" }) to mirror the receipt-chain signing model.

Tool-call equality in diffCassettes ignores id (provider-assigned, varies between runs) and compares name + canonicalised arguments. Tool-call IDs differ on every run; the behaviour is what regression-tests need to lock down.

Single-use replay. cassetteProvider(cassette) keeps a closure-scoped cursor – replaying the same cassette through one provider instance twice will throw exhausted. Either construct a fresh provider per runtime.run() call, or invoke replay.reset() between runs:

TypeScript
const replay = cassetteProvider(cassette);
await runtime.run({ goal: "first" });
replay.reset();
await runtime.run({ goal: "first" });   // succeeds

Dynamic prompt content. A system prompt that legitimately includes a date / session id / UUID would trip strict mode every time. Pass the same hashFilter to BOTH sides – recordingProvider and cassetteProvider – to normalise the prompt before hashing:

TypeScript
const stripSessionId = (msgs) =>
  msgs.map((m) =>
    m.role === "system"
      ? { ...m, content: m.content.replace(/session=[a-f0-9-]+/, "session=<id>") }
      : m,
  );

// Record:
const recorder = recordingProvider(realProvider, { hashFilter: stripSessionId });
// Replay:
const replay = cassetteProvider(cassette, { hashFilter: stripSessionId });

Asymmetric filtering (filter on one side only) intentionally throws – a one-sided filter would silently mask drift instead of detecting it.

recordingProvider is a pure observer – it forwards every call to the wrapped provider verbatim, only watching. Costs / latency / errors / tool calls all behave exactly as if the wrapper weren't there. Production code can leave it on; in tests it's the recording mechanism.

Treat cassettes like sensitive fixtures.

Cassettes capture model output verbatim – assistant content, tool-call arguments, finalAnswer. If a customer email, API key, JWT, or other secret appears in the model's response, it lands in the cassette JSON. Recommended practice:

  • Default to gitignored, e.g. __cassettes__/ in .gitignore. Promote a cassette to a committed cassettes/ directory only after manual review.

  • Pass a redact hook to recordingProvider to scrub known-sensitive patterns at record time:

    TypeScript
    const recorder = recordingProvider(realProvider, {
      redact: (response) => ({
        ...response,
        message: { ...response.message, content: response.message.content.replace(EMAIL_RE, "<EMAIL>") },
      }),
    });
    
  • Recording fails fast on BigInt / functions / symbols in tool-call arguments – those values aren't JSON-encodable, so the recorder throws at capture time rather than producing a cassette that breaks at file-write or load.


What's next

  • Concepts: Fleet – pair the runtime with pluck.fleet({...}) to scale agent execution across N identities.
  • Concepts: Act – the phase mutating tools dispatch through, with signed receipts.
  • Reference: Sensors – every signal an agent can perceive via the sense tool.
Edit this page on GitHub

Ready to build?

Install Pluck and follow the Quick Start guide to wire MCP-first data pipelines into your agents and fleets in minutes.

Get started →