Skip to content

Why Pluck

Why connect → extract → shape → act → sense is the right abstraction for the agent era.


The shape of the problem

Every working agent I've seen – whether it's a Claude Code plugin, a Cursor assistant, a homegrown Python loop with a vector store, or a Zapier "AI action" – does the same five things, roughly in the same order:

  1. It touches something. A URL, a file, a database, a Slack channel, an SSH host, an S3 bucket.
  2. It pulls information out. Lists, fields, tables, transcripts, images, tones.
  3. It coerces the result into a shape the downstream code expects. JSON with the right keys, a typed row, a Zod-validated object.
  4. It does something with it. Posts, writes, sends, updates, deletes, pays.
  5. It observes. It waits, it listens, it polls, it watches for drift.

Every agent framework in circulation eventually grows its own half-built version of these five verbs. LangChain has loaders, parsers, output parsers, tools, and memory. The Vercel AI SDK has tools and function calls. Firecrawl is a scraper that wants to be a pipeline. Zapier is a pipeline that hates developers. Airbyte and Fivetran are pipelines that hate everyone who isn't an enterprise data team.

Nobody has offered a clean, typed, composable runtime where those five verbs are the primitives and everything else is built from them.

Pluck is the attempt.


The five verbs

Connect

connect() is the part that reaches into the outside world. Thirty-plus built-in connectors: HTTP, databases (Postgres, MongoDB, MySQL, SQLite, Redis), message queues (Kafka), filesystems (S3, FTP, local), fleet protocols (SSH), streaming sockets (WebSocket), and public-facing APIs for Reddit, Hacker News, RSS, Google Drive, Dropbox, Spotify, Twitch, Instagram, TikTok, Twitter syndication, Vimeo, oEmbed providers, Telegram.

Every connector takes a URI. You don't learn thirty APIs. You learn one:

TypeScript
const rows = await pluck("reddit://r/typescript/hot");
const traces = await pluck("postgres://db/traces?limit=100");
const stream = await pluck("ssh://web-01/var/log/nginx.log");

Connectors share utilities – bounded body reads, Retry-After handling, host validation against private ranges, peer-dep loading – so the surprising-behaviour count stays low. The connector layer is where most of the sprint-over-sprint security work lives, and it's worth it: a data pipeline that forwards an SSRF to internal metadata endpoints is a liability, not a tool.

Extract

Once you have bytes, you want structure. extract is the layer that turns HTML into rows, PDFs into text, audio into transcripts, and arbitrary documents into LLM-structured output.

The strategy is pluggable – CSS selector, regex, Zod schema, Whisper, Tesseract, hybrid – but the return shape is always the same: typed rows plus phase metadata. The pipeline doesn't care whether you extracted with a selector or an LLM; downstream phases see identical shapes.

Shape

Extraction gives you rows that approximately match what you wanted. Shape is the phase that makes them exactly what you wanted.

Two modes:

  • Schema-only. Pass a Zod schema; anything not in the schema gets stripped, types are enforced, bad rows become structured errors instead of runtime crashes halfway through a Supabase upsert.
  • Schema + field map. Rename, compute, coerce. game.homeTeam.score → home_score, total = home_score + away_score, all deterministic, all unit-testable.

Shape is the fix for the most annoying recurring bug in LLM-assisted extraction: your LLM hallucinates an extra _reasoning field and your database upsert fails at 2 a.m. Shape strips it and logs a drift signal instead.

The drift signal is the interesting part. When the upstream source changes – a new field, a renamed key, a missing value – you find out through onDrift, not through production burning down. It's a contract test that runs on real data, every time.

Act

Reading is safe. Writing is not.

Every action in Pluck – every HTTP POST, every SQL write, every file creation, every Slack message – produces an Ed25519-signed, DSSE-enveloped receipt. The receipt captures who/what/when/why, it's tamper-evident, and it's the thing the undo operation consumes to invert what you just did.

This is where Pluck gets opinionated, because the AI agent era has a trust problem and nobody is fixing it. Your agent makes twelve wrong GitHub comments at 3am? You want to undo them, not chase them with more agents. Your agent pays a wrong invoice? You want a receipt you can take to the bank and an inverse transaction, not a "sorry, it's deterministic, we can't undo it."

Receipts compose with policy (coming in 0.2): .pluckpolicy.yaml deny-lists that block actions before they fire. Think OPA rules, but for agent side-effects: "never delete from the production DB", "any action over $10 requires human confirmation", "redact PII before publishing."

This is the part that makes Pluck not a scraper.

Sense

Sense is the part that surprises people.

Fourteen zero-dependency DSP sensors in pure TypeScript: FFT, spectrogram, DTMF, pitch, tempo, ultrasonic and infrasonic carrier detection, anomaly detection, Morse, AM/FM/SSB demodulation, rPPG (heart rate from face video), bioacoustic birdsong ID. All running without native modules. All emitting typed findings with confidence.

"Why?" is the fair question.

Because the world signals things humans can't hear. Retail stores drop ultrasonic IDs into autoplay video to cross-device-track shoppers. Deepfakes are detectable by the absence of rolling-shutter artifacts and the absence of a pulse on the face. Aircraft transponders broadcast their own position on 1090 MHz. Bird species map to spectrograms with more signal than any photo.

None of this requires a model. It requires a library that treats signal analysis as a first-class primitive alongside "fetch a URL."

pluck snitch <url> runs connect + extract + sense, signs the result as a forensic receipt, and hands you a portable audit of what a page is doing behind your back. That command composes, one-line, from the same verbs as pluck("postgres://..."). That's the point.


Why MCP-first

Every one of those verbs is also an MCP tool, exposed by the @sizls/pluck-mcp server.

If you're building an agent in Claude Code, Cursor, Continue, or any other MCP-compatible runtime, you don't write glue code. You add Pluck to your MCP config, and your agent can now:

  • Extract from any of 30+ sources.
  • Shape the result against a schema you control.
  • Act with signed, reversible, audit-trailed side-effects.
  • Sense signals below human perception.
  • See a URL's context before touching it.

This is not "a library with MCP adapters tacked on." The MCP surface is the primary interface; the library is how you extend it. We wrote the MCP server before we wrote half of the CLI.

For the launch, that positioning is everything: agents already exist. They already have tools. What they don't have is a pipeline primitive that signs what it touches, reverses when it's wrong, and can hear above 20 kHz. Pluck is that primitive.


What Pluck is not

  • Not a scraper. Firecrawl is a better scraper. Scrapy is a better scraper. If all you need is HTML-to-markdown, use one of those.
  • Not a RAG framework. LangChain and LlamaIndex have a head start on RAG. Pluck is a layer underneath – you can build RAG on it, but that's not the opinionated goal.
  • Not a Zapier replacement for non-developers. Pluck is a TypeScript library first. The hosted dashboard will get friendlier, but the primary audience is people who write code.
  • Not a model. Pluck runs models when they're useful (Whisper for transcribe, Tesseract for OCR, Claude for structured extraction), but it's not a model and it has no opinion about which model you use.
  • Not feature-complete. There are thirty-plus connectors, not three hundred. There are three actors, not thirty. The backlog is long and deliberate. Priority goes by adoption signal, not by internal enthusiasm.

Why now

Two things changed in 2025-2026.

MCP became the agent-tool lingua franca. Not a standard, exactly, but close enough that writing MCP-first is no longer a risky bet. Agents can now consume tools without a custom adapter, and the best tools are the ones that expose themselves through MCP without a middle layer.

Agent safety became a first-order problem. It used to be a theoretical concern. Now every week there's another story about an agent that deleted a production table, paid the wrong invoice, or posted something embarrassing. The industry is asking for a primitive that solves this. Receipts, undo, and policy are that primitive. Pluck is betting the OSS launch on the idea that developers know they need this even if they haven't named it yet.


How to start

shell
npm install @sizls/pluck

Then:

The repo is sizls/pluck. The packages are @sizls/pluck, @sizls/pluck-cli, @sizls/pluck-api, @sizls/pluck-mcp. Everything is MIT.


The one-line summary

Pluck is the first TypeScript pipeline where an agent's side-effects are signed, reversible, and perceptible below the human threshold.

If that sentence makes you curious, clone the repo. If it doesn't, send it to the skeptic on your team who's been asking "but how do we audit what the agent actually did?" for the last six months.

We're going to be very glad we built this.


– Jason, maintainer of @sizls/pluck

Edit this page on GitHub

Ready to build?

Install Pluck and follow the Quick Start guide to wire MCP-first data pipelines into your agents and fleets in minutes.

Get started →