- Docs
- Getting Started
- MCP-First Pipeline
Getting Started
MCP-First Pipeline
Pluck is built MCP-first. Every pipeline phase is exposed as an MCP (Model Context Protocol) tool, spec-compliant against MCP 2024-11-05, mutations are dry-run by default, and every act produces an Ed25519-signed receipt. Add one line to your agent config and your Claude / Cursor / Continue session can connect to 30 sources, extract structured data, shape it against Zod schemas, mutate with reversibility, and sense signals humans can't perceive.
Why MCP-first
The 2026 agent ecosystem runs on MCP. Claude Desktop, Cursor, Continue, the Claude Code CLI, and every serious agent runtime now speak the Model Context Protocol. Pluck is not a data pipeline library with "MCP support tacked on" – the MCP surface is the primary distribution story, and every piece of the pipeline is designed to compose through it.
What that means in practice:
- Every phase is an MCP tool.
pluck_extract,pluck_act,pluck_sense,pluck_snitch,pluck_probe,pluck_context,pluck_dowse,pluck_speak,pluck_radio– 9 tools covering the full pipeline surface. - Spec-compliant. Pluck's MCP server targets spec version 2024-11-05. Notifications return no response (JSON-RPC 2.0 §4.1).
pingkeepalive is implemented. Stdio transport handles CRLF line endings and empty lines correctly. - Safe by default.
pluck_actdefaults todryRun: trueso agents cannot accidentally mutate state. An agent must passdryRun: falseexplicitly to execute. - Signed. Every mutation the agent performs produces an Ed25519-signed receipt. The receipt is verifiable without Pluck installed – just the public key.
If you are writing an agent today, adding Pluck means adding one stdio server to your MCP config. That's it.
Install
npm install @sizls/pluck-mcp
The package ships a pluck-mcp bin that runs the MCP server over stdio. Point your agent's MCP config at it.
Wire up an agent
Claude Desktop
Add to your ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"pluck": {
"command": "npx",
"args": ["-y", "@sizls/pluck-mcp"]
}
}
}
Restart Claude Desktop. The 9 Pluck tools appear automatically.
Cursor
In .cursor/mcp.json at your project root (or ~/.cursor/mcp.json globally):
{
"mcpServers": {
"pluck": {
"command": "npx",
"args": ["-y", "@sizls/pluck-mcp"]
}
}
}
Claude Code
claude mcp add pluck -- npx -y @sizls/pluck-mcp
Continue / other clients
Any MCP-spec client that supports stdio transport works the same way – run npx -y @sizls/pluck-mcp and the server handshakes over stdin/stdout.
Scoping the tool surface
Agents that only need a subset of Pluck's 9 tools can narrow the visible surface. Less tool-list noise means smaller system prompts and cheaper per-turn cost:
{
"mcpServers": {
"pluck": {
"command": "npx",
"args": ["-y", "@sizls/pluck-mcp"],
"env": {
"PLUCK_MCP_TOOLS": "extract,act"
}
}
}
}
PLUCK_MCP_TOOLS is an allowlist (comma-separated, with or without the pluck_ prefix). PLUCK_MCP_EXCLUDE is a denylist. Exclude wins when the same tool appears in both.
The 9 tools
| Tool | What your agent can do | Default safety |
|---|---|---|
pluck_extract | Pull structured content from any URL / DB / file / API. Returns typed text in any of 12 output formats. | Read-only. |
pluck_act | Perform one of 27 actions across 9 actors: HTTP (post/put/patch/delete), GraphQL (mutate/query), browser (6 Playwright actions), browser-agent (LLM-driven agent-navigate), shell-write (fs:*, exec:command), email (send / send-with-attachment), AWS (s3/dynamodb/sqs/sns/lambda), GCP (storage/pubsub/firestore/functions), Azure (blob/servicebus/cosmos/functions). Every call produces a signed receipt. | dryRun: true default; browser-agent adds a 4-layer response-policy gate (allowedDomains, allowedActions, actionBudget, humanInLoop); cloud function URLs pass through a strict safeHttpsUrl host-suffix allowlist. |
pluck_sense | Analyse an audio / video / text / image source for signals below human perception. 37 sensors across spectral (fft / spectrogram / pitch / tempo / chromagram / mfcc), decoded (dtmf / morse / fsk / psk / am-demod / fm-demod / ssb-demod), band (ultrasonic / infrasonic), diagnostic (noise-floor), identity (birdsong / rppg / animalsong), physiological (heartbeat / breathing), periodicity + anomaly, text-domain (cipher-classify / cipher-crack-caesar / cipher-crack-vigenere / steganography-text), image-domain (ela / heatmap / moire / flicker / rolling-shutter), plus CV-domain (faces / scene / ocr-text-regions / thermal / ground-anomaly). Three optional peers: sharp, face-api.js, @xenova/transformers. | Read-only. |
pluck_snitch | Privacy audit any URL. Returns a forensic report covering trackers, fingerprinting, ultrasonic beacons, and PII leaks. Sign it by passing a signing key in the environment. | Read-only. |
pluck_probe | Pre-flight introspection – source type, content type, estimated cost, recommended format. Zero full-pipeline cost. | Read-only. |
pluck_context | "Where am I?" – schema.org, OG tags, robots.txt, sitemaps, known connectors that match, PII likelihood. | Read-only. |
pluck_dowse | Zero-config signal reconnaissance – runs all sensors in fast mode against a signal source and ranks findings. | Read-only. |
pluck_speak | Inverse of sense – encode JSON as ultrasonic, DTMF, or Morse. | Producer-only (returns base64 WAV). |
pluck_radio | Decode RF protocols from SDR captures. Ships with the ADS-B aircraft decoder today; FM/AM/SSB demodulators available via pluck_sense. | Read-only. |
Every tool's inputSchema is valid JSON Schema draft-7 and is what the agent sees when Pluck joins the session. Agents discover the tools dynamically – you don't have to document anything for Claude or Cursor; the server describes itself.
Shape is deliberately absent from the MCP tool list. Shape runs in-process between extract and act; it's a type-level contract, not an agent-facing tool. See Concepts: Shape for the pattern.
Zero-key LLM extraction (MCP sampling)
pluck_extract accepts a strategy argument – "css" | "regex" | "llm" | "hybrid". When the agent asks for "llm" or "hybrid" and no PLUCK_LLM_API_KEY is set in the server's environment, Pluck routes the extraction through the MCP host's own sampling/createMessage endpoint. The host's LLM answers the prompt; Pluck structures the call and parses the JSON response.
The practical effect:
- No
OPENAI_API_KEY/ANTHROPIC_API_KEYin the server env. - No rate-limit or cost on Pluck's side – the host's quota pays.
- Model hints respected – pass
llm: { model: "claude-opus-4-7" }and the host sees it as amodelPreferences.hints[]entry per the sampling spec.
{
"tool": "pluck_extract",
"arguments": {
"uri": "https://news.ycombinator.com/item?id=42",
"strategy": "llm",
"prompt": "Extract the title, author, and number of comments.",
"schema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"author": { "type": "string" },
"comments": { "type": "number" }
},
"required": ["title", "author", "comments"]
}
}
}
When the server emits initialize, its capability set now includes sampling: {} – Claude Desktop / Cursor / Continue all see the server is sampling-aware and will answer the outbound request from their own model. Hosts that don't support sampling get a clean error back instead of a hang – SamplingClient enforces a 60-second timeout on every outbound request.
This is the deepest "MCP-first" wedge in the tool. No other MCP server in the ecosystem inverts the server → client flow for LLM calls; every other LLM-aware server requires the user to bring a second API key.
Direct JSON-RPC (driving the server yourself)
Every MCP tool is reachable via vanilla JSON-RPC 2.0 over stdio. This is what Claude Desktop / Cursor / Continue send internally, and it's the easiest way to debug tool behaviour without an agent in the loop.
# Start the server with a PATH-scoped TTY. Easiest way: a single
# echo-pipe for a one-shot call, or a `tee` wrapper for live debugging.
npx -y @sizls/pluck-mcp
Once running, send frames on stdin:
// 1) Handshake – every session starts here.
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05"}}
// Server responds with `capabilities: { tools: {}, sampling: {} }`.
// 2) Discover the tools – the JSON Schemas the agent reasons about.
{"jsonrpc":"2.0","id":2,"method":"tools/list"}
// 3) Call a tool. Arguments match the inputSchema from tools/list.
{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "pluck_extract",
"arguments": {
"uri": "https://news.ycombinator.com",
"format": "markdown"
}
}
}
Every response is a single line of JSON. Errors follow JSON-RPC conventions ({ error: { code, message } }); tool invocations that throw are wrapped as isError: true content blocks per MCP spec.
Concrete curl-style smoke test
Drop this into a one-off script when you want to confirm a server binary is healthy without wiring a full agent:
import { spawn } from "node:child_process";
const server = spawn("npx", ["-y", "@sizls/pluck-mcp"], {
stdio: ["pipe", "pipe", "inherit"],
});
const send = (frame: Record<string, unknown>) => {
server.stdin.write(JSON.stringify(frame) + "\n");
};
server.stdout.on("data", (buf) => {
for (const line of buf.toString().split("\n").filter(Boolean)) {
console.log("←", JSON.parse(line));
}
});
send({ jsonrpc: "2.0", id: 1, method: "initialize", params: { protocolVersion: "2024-11-05" } });
send({ jsonrpc: "2.0", id: 2, method: "tools/list" });
send({
jsonrpc: "2.0",
id: 3,
method: "tools/call",
params: {
name: "pluck_probe",
arguments: { uri: "https://news.ycombinator.com" },
},
});
setTimeout(() => server.kill(), 3000);
Expected output: three responses (initialize handshake, tools/list with 9 entries, probe result with sourceType, contentType, and estimatedCost).
Example prompts
Once Pluck is wired in, natural-language prompts just work:
Extract the top 10 posts from https://news.ycombinator.com and summarise the themes. Agent calls
pluck_extract("https://news.ycombinator.com"), gets structured HN data, summarises.
Audit https://example.com for tracker leaks and produce a signed report. Agent calls
pluck_snitch("https://example.com"), returns signed forensic findings.
What's in this WAV file? Check for hidden touch-tones and anything above 18 kHz. Agent calls
pluck_dowse("./mystery.wav")first, orpluck_sense({ uri: "./mystery.wav", detect: ["dtmf", "ultrasonic"] })directly.
Dry-run a DELETE on https://api.example.com/users/42 – what would happen? Agent calls
pluck_act({ uri, action: "delete", dryRun: true }), shows the preview receipt.
Actually delete it now – here's my signing key. Agent calls
pluck_act({ uri, action: "delete", dryRun: false }), returns the real signed receipt.
Safety model
Pluck's MCP surface is designed around three assumptions about agent behaviour:
- Agents make mistakes. Default everything to dry-run. Force the agent (or the human in the loop) to opt in to real mutations.
- Agents should produce observations. Every
pluck_actcall produces aSignedReceiptwith a canonical, verifiable signature. Reviewers verify receipts offline with the public key alone. - Agents shouldn't exceed your policy. The underlying
createPluck({ policy: "./.pluckpolicy.yaml" })config gates every action;pluck_acthonours it. An agent attempting todeletea production URL against adenyrule getsPOLICY_DENIEDand cannot override.
The stdio transport trusts the host OS – Pluck's MCP server does not ship its own auth. Access control is whatever your agent client enforces. Configure Pluck credentials (API keys, SSH keys, signing keys) in the same environment where you launch the server.
The one-line pitch
A Claude Code user opens a new session. They type:
"Add Pluck to my MCP config."
Pluck joins the session. The agent now has:
- Eyes – 30+ connectors for reading any URL / DB / file / API.
- Hands – signed, reversible, policy-gated mutations with dry-run by default.
- Ears + eyes – 37 sensors including ultrasonic, rPPG heart-rate from video, heartbeat/breathing from audio, DTMF, Morse, FSK, PSK, chromagram, MFCC, classical cipher classification + cracking, invisible-character steganography detection (including the Unicode tag block used in ASCII-smuggling prompt-injection), image forensics (ELA tampering, rolling-shutter deepfake signature, moiré screen-recording detection, AC-light flicker banding), and ML-backed CV (face detection + single-frame liveness heuristic, scene classification, pre-OCR text-region detection, thermal hotspots, satellite ground-anomaly change-detection, broader bioacoustic ID beyond birds). Plus live-streaming via
createSensorStreamfor mic / SDR / SIP feeds. - Conscience – every act produces an Ed25519-signed receipt.
pluck.undo(receipt)reverses it.
No other tool in the ecosystem offers that bundle through a single MCP server.
What's next
The five concepts Pluck exposes over MCP:
- Concepts: Connect – URI → connector → typed bytes. 30 connectors ship.
- Concepts: Navigate – prepare bytes between connect and extract. Readability, Playwright, agent-driven.
- Concepts: Extract – bytes → text, segments, data.
- Concepts: Shape – loose data → Zod contract. Runs in-process; no MCP tool.
- Concepts: Act – signed receipts + undo + policy + idempotency, in depth.
- Concepts: Sense – 37 sensors across audio / video / text / image / CV; three optional peers (
sharp,face-api.js,@xenova/transformers).
Reference and recipes:
- Reference: Connectors – full list of the URI schemes
pluck_extractunderstands. - Reference: API – REST API for agents that prefer HTTP over stdio.
- Recipe: Snitch Privacy – the signed forensic audit demo.
- Getting Started – install + first pluck in 5 minutes.
Full runnable example
The smallest no-agent MCP integration – spawns @sizls/pluck-mcp as a subprocess, issues the initialize handshake, asks for tools/list, and calls the pluck_extract tool against a live URL. Exactly the three frames Claude Desktop / Cursor / Continue send internally. Opens in a fresh StackBlitz sandbox.