- Docs
- Runtime
Runtime
pluck.runtime({...}) returns an AgentRuntime that orchestrates N heterogeneous agents. Each agent drives its own LLM provider, consumes its own subset of Pluck verbs as tools, runs under its own budget cap, and emits signed receipts for every turn + tool call into the v0.11 Substrate.
Why a runtime that's NOT just one provider per fleet
Pluck's runtime is heterogeneous on purpose. MCP sampling is request-response-serial, so "1000 agents through one host LLM" is a serialised bottleneck dressed as parallelism. Instead, each agent supplies its own provider + credentials and runs its own loop. Need a Sonnet 4.6 routing agent handing off to ten Haiku 4.5 specialists? Each declares its own LlmProviderAdapter – the runtime sequences them through a handoff graph.
The runtime is intentionally minimal. The canonical multi-agent orchestrator lives in @directive-run/ai; Pluck's contribution layered on top is:
- Pluck verb surface as auto-registered tools –
connect,extract,shape,act,sense,probe,context,dowseavailable to every agent via itstoolsmanifest, with JSON schemas and per-field validation. - Substrate-backed signed trace – every turn + every tool call lands in the event log; receipts chain via
parentSigso the run is a verifiable record. - Fleet integration – fleet members can each own a runtime instance, scaling agent execution to N members × M targets.
The factory call
import {
createPluck,
openaiProvider,
anthropicProvider,
} from "@sizls/pluck";
const pluck = createPluck();
const runtime = pluck.runtime({
agents: [
{
id: "researcher",
systemPrompt: "Read the URL and summarise the key facts.",
provider: openaiProvider({ model: "gpt-5", apiKey: process.env.OPENAI_API_KEY! }),
tools: ["extract", "context", "sense"], // read-only manifest
budget: { maxTurns: 10, maxToolCalls: 30, maxTokens: 50_000 },
},
{
id: "actor",
systemPrompt: "Take the researcher's notes and post a summary.",
provider: anthropicProvider({ model: "claude-sonnet-4-6-20260201", apiKey: process.env.ANTHROPIC_API_KEY! }),
tools: ["act"], // mutating – opt-in
budget: { maxTurns: 5, maxToolCalls: 5, maxCostUsd: 1.0 },
},
],
signingKey: process.env.PLUCK_SIGNING_KEY!,
});
const result = await runtime.run({
goal: "Read https://news.example.com/post and post a summary to slack.",
plan: {
entry: "researcher",
edges: [{ from: "researcher", to: "actor" }],
exits: ["actor"],
},
});
Providers
Five provider adapters ship with the runtime. The first two are deterministic and meant for tests; the rest speak vendor wire protocols natively, including tool calls.
| Provider | Use for | Cost reporting |
|---|---|---|
echoProvider() | tests, smoke runs – replies "received: <last user prompt>" | none |
scriptedProvider(steps) | unit tests – caller supplies a step-by-step response script | none |
openaiProvider({ apiKey, model }) | OpenAI / Azure / Together / Fireworks / OpenRouter (anything OpenAI-compatible) | costUsd from OPENAI_PRICING table |
anthropicProvider({ apiKey, model }) | Anthropic Messages API | costUsd from ANTHROPIC_PRICING table |
ollamaProvider({ model, baseURL }) | local Ollama (http://localhost:11434 default) | none – local models are free |
All three real adapters:
- require an explicit
modeland a non-emptyapiKeyat construction – both throw at factory time so misconfiguration surfaces before the first turn rather than as a 401 on first call. (Ollama ismodel-only – local models are free.) - accept a
fetchoverride for tests + custom transport (Azure header injection, Bedrock signing, etc.) - reject
baseURLthat isn'thttps://(HTTP only permitted onlocalhost/127.0.0.1/[::1]) so a misconfigured base URL can't accidentally ship an API key in the clear - respect
request.signal– when the runtime is aborted, the in-flight LLM call is torn down - accept a per-request
timeoutMsthat aborts the fetch independently of the run-level signal - map vendor-specific tool-call shapes to Pluck's
LlmToolCall[]so the runtime loop is identical regardless of which provider drove the turn - throw a typed
LlmProviderHttpErroron non-2xx responses, withstatus,bodySnippet(≤ 500 chars, capped read at 8 KiB),providerName, a status-awarehint(e.g. "auth rejected – verify apiKey is set + valid for this baseURL", "rate-limited – back off and retry"), andretryAfterMsparsed fromRetry-Afterfor 429 responses
import {
openaiProvider,
anthropicProvider,
ollamaProvider,
} from "@sizls/pluck";
// OpenAI
const gpt = openaiProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-4o",
timeoutMs: 30_000,
});
// Anthropic
const claude = anthropicProvider({
apiKey: process.env.ANTHROPIC_API_KEY!,
model: "claude-sonnet-4-6-20260201",
maxTokens: 4096,
});
// Self-hosted Ollama (note: localhost http is permitted; remote http is not)
const llama = ollamaProvider({
model: "llama3.1",
// baseURL: "http://localhost:11434", // default
});
// OpenAI-compatible gateway (e.g. OpenRouter, Together, Fireworks)
const router = openaiProvider({
apiKey: process.env.OPENROUTER_API_KEY!,
baseURL: "https://openrouter.ai/api/v1",
model: "anthropic/claude-sonnet-4-6",
});
OPENAI_PRICING and ANTHROPIC_PRICING are exported maps of model → { input, output } USD-per-million-tokens rates. Lookup is exact-match first, then longest-prefix – date-suffix variants like gpt-4o-2024-11-20 resolve to the gpt-4o rate without an explicit entry. Pass a custom table via pricing: { ... } when you're hitting a model neither covers; when no rate matches a one-shot dev warning fires (per provider+model), costUsd is reported undefined, and maxCostUsd budget caps will not trigger for that agent.
API-key destination warning. The configured
apiKeyis sent verbatim asAuthorization: Bearer …(OpenAI / Ollama-compat) orx-api-key: …(Anthropic) to whatever hostbaseURLresolves to. Mistyping a domain ships your key to it. Only point at trusted vendor / gateway hosts (thehttps://check rules out plaintext, not "wrong destination").
Need Gemini, Bedrock, Vertex, or another vendor? LlmProviderAdapter is a one-method interface – turn(request): Promise<LlmTurnResponse>. Wrapping any chat-completions API in ~80 LOC is the supported extension path.
The default tool surface is read-only
When you omit tools on an AgentDefinition, the agent gets the safe-default subset: connect, extract, shape, sense, probe, context, dowse – read everything, mutate nothing. Mutating agents must opt into "act" explicitly. Same discipline as the v0.4 browser-agent actor's response-policy gate: trust to read != trust to write.
Unknown tool names fail at runtime() construction so an agent never gets handed a dangling reference. An LLM that emits a tool call NOT in its agent's manifest gets back a generic "tool unavailable" message – the runtime deliberately doesn't echo the requested name back so a compromised provider can't enumerate the host's manifest by probing.
Custom tools
Agents can register caller-supplied tools alongside Pluck verbs via customTools:
import type { CustomTool, CustomToolContext } from "@sizls/pluck";
const slackPost: CustomTool = {
name: "slack_post",
description: "Post a message to a Slack channel.",
schema: {
type: "object",
required: ["channel", "text"],
properties: {
channel: { type: "string" },
text: { type: "string" },
},
},
async invoke(args, ctx: CustomToolContext) {
// ctx.signal – torn down on runtime.destroy() or run({signal}) abort
// ctx.pluck – the bound PluckInstance, so a custom tool can chain
// into Pluck verbs (e.g. ctx.pluck.act(...))
// ctx.agentId – the agent that emitted the call
const res = await fetch("https://slack.com/api/chat.postMessage", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${process.env.SLACK_BOT_TOKEN}`,
},
body: JSON.stringify(args),
...(ctx.signal ? { signal: ctx.signal } : {}),
});
return res.json();
},
};
const runtime = pluck.runtime({
agents: [
{
id: "poster",
systemPrompt: "Read the URL and post a summary to #general.",
provider: anthropicProvider({ apiKey, model: "claude-sonnet-4-6-20260201" }),
tools: ["extract"], // Pluck verbs in scope
customTools: [slackPost], // caller tools layered on top
},
],
});
Same hardening applies as for Pluck verbs:
- Names: must match
/^[a-zA-Z_][a-zA-Z0-9_-]*$/and may not collide with a Pluck verb (connect,extract,shape,act,sense,probe,context,dowse). Bad manifests fail atruntime()construction. Pick a domain prefix likeslack_*/db_*/_my_*to keep room for new Pluck verbs in future minor releases – acustomTools: [{name: "deduce", …}]would silently pass on Pluck v0.14 and reject on a future Pluck that adds adeduceverb. - Prototype-pollution scrub:
__proto__/constructor/prototypekeys are stripped from plain-object args beforeinvokesees them. Args nested deeper than 64 levels reject as a generic"tool unavailable"so a hostile model can't stack-overflow the host process. - Abort signal:
ctx.signalis the run-level signal; long-running custom tools should pass it into their own fetch / spawn calls so abort tears them down mid-flight. - Error mapping: throwing
ToolArgErrorfrominvokesurfaces a generic"tool unavailable"to the LLM (matches the discipline applied to built-in verbs); throwing any otherErrorforwards the error message verbatim to the LLM so domain failures stay debuggable in the agent's conversation. - Return values: must be JSON-serialisable for the LLM tool message.
BigIntis coerced to a string and circular references render as"[Circular]"; other non-JSON values (functions, symbols) are dropped. Returnnull/undefinedwhen there's nothing to say. - Tool surface: Pluck verbs and custom tool names share the same flat namespace the model sees. Two custom tools with the same name reject at construction time.
Trust model. A custom tool's
invokeis trusted code, not LLM-supplied. The LLM picks tool names + arguments; your code decides what those mean. Treat the LLM as if it could emit anything.
ctx.pluckis unscoped. A custom tool receives the fullPluckInstanceand can callctx.pluck.act(...)even when the agent'stoolsallowlist excludes"act". Thetools: [...]allowlist gates which Pluck verbs the LLM can ask for directly – it does not gate what your custom-tool code does behind the scenes. If a custom tool proxies an LLM-supplied URI intoctx.pluck.act(), validate the URI yourself first.- Error messages reach the LLM verbatim (when not wrapped in
ToolArgError). Don'tthrow new Error(\DB rejected password '${secret}'`)– the model's host (which may be a third-party provider) will see the secret. Wrap secrets inToolArgErrorto mask the detail behind the generic"tool unavailable"` message; the substrate event still records the full diagnostic for operators.descriptionandschemaare sent to the model verbatim as part of the tools-list system context. Treat them like prompt fragments – never derive them from untrusted input or a third party's npm package without reading what they put there.
Pass tools: [] + a populated customTools to run an agent that has only caller tools – useful when the agent's job is unrelated to the Pluck verb surface.
Argument validation
LLM-supplied tool args run through hardened validation before any Pluck verb sees them:
- Prototype-pollution scrub –
__proto__/constructor/prototypekeys stripped recursively from plain-object args. Non-plain objects (Date,Map,Set,Buffer, class instances) pass through untouched so aBufferlegitimately routed throughact.inputdoesn't get mangled into{}. - Strict type checks –
shape.templateandact.actionrequire literal strings; an LLM emitting{ action: { toString: () => "delete" } }fails fast with a typedToolArgError. - Generic LLM error – when validation fails, the agent's tool-result message says only
"tool unavailable". The substrate event keeps the full detail (which tool, which field) for operators inspecting the trace.
Budget caps
Every agent has a budget tracker. Default limits:
| Cap | Default | Purpose |
|---|---|---|
maxTurns | 50 | Stop runaway loops. |
maxToolCalls | 200 | Stop tool-call thrashing. |
maxTokens | unlimited | Token-budget gate when set. |
maxCostUsd | unlimited | Hard $ ceiling per agent per run. |
Exhaustion exits the agent loop with a typed budgetExhausted: "turns" | "toolCalls" | "tokens" | "costUsd" reason. The trace records agent.budget.exhausted. The handoff graph sees the agent as having "completed" so flow continues normally to the next handoff (or terminates) – no special crash path.
Handoff graph
const plan = {
entry: "router",
edges: [
{ from: "router", to: "specialist", when: (a) => a.includes("specialist") },
{ from: "router", to: "fallback" },
{ from: "specialist", to: "actor" },
{ from: "fallback", to: "actor" },
],
exits: ["actor"],
};
Each edge has an optional when(finalAnswer) predicate – the first matching edge fires. Cycles are detected per-agent: an agent can be re-entered up to maxAgentVisits times (default 8, configurable). Diamond patterns (A→B→D + A→C→D) work without false-positive cycles because cycle detection counts visits, not bare entry. Genuine cycles (A→B→A→B…) hit the cap and emit agent.handoff.cycle.
Skipping the plan entirely runs the first registered agent in single-agent mode – no graph required for the common one-agent case.
Trace events
Every event flows through the Substrate's event log:
| Event | When | Capability |
|---|---|---|
agent.llm.turn | Every provider turn | agent:llm |
agent.llm.error | Provider throws | agent:llm |
agent.tool.invoke | Tool call succeeds | agent:tool |
agent.tool.failed | Tool call throws (incl. validation) | agent:tool |
agent.tool.rejected | Tool name not in manifest | agent:tool |
agent.handoff.transition | Edge fires | agent:control |
agent.handoff.cycle | Visit cap hit | agent:control |
agent.budget.exhausted | Any cap hit | agent:policy |
agent.provider.destroy.failed | Provider destroy throws | agent |
Every event carries an HLC timestamp. When a signingKey is configured, every turn + tool call is also written to a chained Ed25519 receipt – the chain is serialised behind a chainTail mutex (same pattern as v0.11 fleet) so future parallel-handoff workloads can't fork the chain.
Memory
The default in-memory adapter (createInMemoryRuntimeMemory()) is Map-backed with three scope conventions:
"agent:{id}"– per-agent scratch."shared"– readable + writable across agents within one run. The runtime auto-writes each agent'sfinalAnswerhere underagent:{id}:answer."global"– persisted across runs (requires a durable adapter).
Swap in a SQLite/Redis/etc. adapter when state needs to outlive a process.
Lifecycle
try {
const runtime = pluck.runtime({ ... });
const result = await runtime.run({ goal });
} finally {
await runtime.destroy(); // calls each provider's destroy(), drains the chain
}
Provider destroy errors emit agent.provider.destroy.failed events instead of disappearing silently – operators get a diagnostic trace for misbehaving real-world adapters (token-pool clients, model loaders).
runtime.run({ signal }) propagates the signal into every Pluck verb invocation through the tools layer. A pre-flight signal.aborted check throws before starting a doomed verb call so long-running pluck.act / pluck.sense operations don't waste budget after a cancel.
Cassette replay (regression-testing LLM behaviour)
Pin a known-good run, swap providers, verify the new provider produces the same tool-call sequence. Useful when upgrading models (gpt-4o → gpt-5, claude-sonnet-4-5 → claude-sonnet-4-6) or comparing alternate providers (Anthropic vs OpenAI on the same goal).
import {
cassetteProvider,
diffCassettes,
recordingProvider,
openaiProvider,
} from "@sizls/pluck";
// 1. Record – wrap the real provider in `recordingProvider`.
const recorder = recordingProvider(openaiProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-4o",
}));
const runtime1 = pluck.runtime({
agents: [{ id: "a", systemPrompt: "...", provider: recorder, tools: ["extract"] }],
});
await runtime1.run({ goal: "summarise the homepage" });
const cassette = recorder.toCassette({ agentId: "a" });
fs.writeFileSync("./cassettes/homepage.json", JSON.stringify(cassette, null, 2));
// 2. Replay – feed the cassette back in place of the real provider.
const replay = cassetteProvider(cassette);
const runtime2 = pluck.runtime({
agents: [{ id: "a", systemPrompt: "...", provider: replay, tools: ["extract"] }],
});
await runtime2.run({ goal: "summarise the homepage" });
// ↑ identical tool-call sequence, no API spend, deterministic.
// 3. Drift detection – record a NEW run against gpt-5, diff vs the gpt-4o cassette.
const newRecorder = recordingProvider(openaiProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5",
}));
const runtime3 = pluck.runtime({
agents: [{ id: "a", systemPrompt: "...", provider: newRecorder, tools: ["extract"] }],
});
await runtime3.run({ goal: "summarise the homepage" });
const newCassette = newRecorder.toCassette({ agentId: "a" });
const drift = diffCassettes(cassette, newCassette, { ignoreContent: true });
if (drift.length > 0) {
throw new Error(`Tool-call drift between gpt-4o and gpt-5: ${JSON.stringify(drift)}`);
}
Cassette format: JSON-serialisable list of {turnIndex, promptHash, responseHash?, response} entries inside an envelope of {version, agentId?, recordedAt, recordedProvider, cassetteId?, envelopeHash?}. promptHash is a SHA-256 of the canonicalised conversation messages – cassetteProvider enforces it strictly by default so a drift in upstream tool dispatch (e.g. extract returning a different result) raises a typed error mid-replay rather than silently producing the cached response. Set strict: false for "best-effort" replay during exploration. turnIndex cross-checks against the replay cursor so a hand-edited or out-of-order cassette is detected at load time, not at end-of-run.
Cassette integrity (v0.35-R2+)
Cassettes flow through PR review, shared S3 buckets, npm packages – anywhere a JSON file can be hand-edited between record and replay. From v0.35-R2 onward, recordings emit version: 2 cassettes with three layered integrity controls:
responseHashper entry – SHA-256 over canonical{cassetteId, turnIndex, promptHash, response}. Detects per-entry edits, cross-cassette splice (binding tocassetteId), and same-turn poisoning (binding toturnIndex + promptHash). Replay recomputes and refuses on mismatch.cassetteId– random 16-byte (32-hex) identity generated byrecordingProviderat first turn. Bound into everyresponseHash, so an entry from cassette B cannot be spliced into cassette A.envelopeHash– SHA-256 over canonical{version, agentId, recordedAt, recordedProvider, cassetteId, entryDigests}. Detects metadata tampering (recordedProviderswap,agentIdrebind,recordedAtforgery) that per-entry hashes do not cover.
const replay = cassetteProvider(cassette); // v2 always validates integrity
const replay = cassetteProvider(cassette, { requireIntegrity: true }); // also refuses v1
requireIntegrity: true refuses v1 cassettes wholesale – recommended for CI / production pipelines where a v1 fixture without integrity guarantees should not flow through. Default false for backward compatibility with v0.34 fixtures; v1 cassettes still load with a one-time console.warn reminding operators to re-record.
Downgrade-attack defence. A real v1 cassette never carries cassetteId, envelopeHash, or per-entry responseHash – those fields ship only on v2. v0.35-R3 added a downgrade detector: a v1-tagged cassette carrying any v2-only field is refused with "downgrade tampering suspected". This closes the attack where someone strips version: 2 → 1 and forges per-entry hashes via a v1-compat fallback.
Re-record to upgrade. v0.34/R1 cassettes (version 1, with or without responseHash) replay correctly under v0.35-R2+ EXCEPT when they carry an old-format responseHash (which the downgrade detector refuses). To upgrade: rerun the recording against the live provider – recordingProvider always emits v2 from v0.35-R2 onward.
Integrity vs. authenticity. envelopeHash is SHA-256, not Ed25519-signed – it detects post-record edits but not authorship. For non-repudiation (a regulator asking "prove this cassette came from this CI pipeline at this time"), pair the cassette load with a pluck.act receipt of the recording event. A v0.36 follow-up will add recordingProvider({ sign: "env:PLUCK_SIGNING_KEY" }) to mirror the receipt-chain signing model.
Tool-call equality in diffCassettes ignores id (provider-assigned, varies between runs) and compares name + canonicalised arguments. Tool-call IDs differ on every run; the behaviour is what regression-tests need to lock down.
Single-use replay. cassetteProvider(cassette) keeps a closure-scoped cursor – replaying the same cassette through one provider instance twice will throw exhausted. Either construct a fresh provider per runtime.run() call, or invoke replay.reset() between runs:
const replay = cassetteProvider(cassette);
await runtime.run({ goal: "first" });
replay.reset();
await runtime.run({ goal: "first" }); // succeeds
Dynamic prompt content. A system prompt that legitimately includes a date / session id / UUID would trip strict mode every time. Pass the same hashFilter to BOTH sides – recordingProvider and cassetteProvider – to normalise the prompt before hashing:
const stripSessionId = (msgs) =>
msgs.map((m) =>
m.role === "system"
? { ...m, content: m.content.replace(/session=[a-f0-9-]+/, "session=<id>") }
: m,
);
// Record:
const recorder = recordingProvider(realProvider, { hashFilter: stripSessionId });
// Replay:
const replay = cassetteProvider(cassette, { hashFilter: stripSessionId });
Asymmetric filtering (filter on one side only) intentionally throws – a one-sided filter would silently mask drift instead of detecting it.
recordingProvider is a pure observer – it forwards every call to the wrapped provider verbatim, only watching. Costs / latency / errors / tool calls all behave exactly as if the wrapper weren't there. Production code can leave it on; in tests it's the recording mechanism.
Treat cassettes like sensitive fixtures.
Cassettes capture model output verbatim – assistant content, tool-call arguments, finalAnswer. If a customer email, API key, JWT, or other secret appears in the model's response, it lands in the cassette JSON. Recommended practice:
Default to gitignored, e.g.
__cassettes__/in.gitignore. Promote a cassette to a committedcassettes/directory only after manual review.Pass a
redacthook torecordingProviderto scrub known-sensitive patterns at record time:TypeScriptconst recorder = recordingProvider(realProvider, { redact: (response) => ({ ...response, message: { ...response.message, content: response.message.content.replace(EMAIL_RE, "<EMAIL>") }, }), });Recording fails fast on
BigInt/ functions / symbols in tool-call arguments – those values aren't JSON-encodable, so the recorder throws at capture time rather than producing a cassette that breaks at file-write or load.
What's next
- Concepts: Fleet – pair the runtime with
pluck.fleet({...})to scale agent execution across N identities. - Concepts: Act – the phase mutating tools dispatch through, with signed receipts.
- Reference: Sensors – every signal an agent can perceive via the
sensetool.