- Docs
- Core Concepts
- Output
Core Concepts
Output
The seventh phase of the Pluck pipeline. Twelve built-in formats, six presets, and arbitrary template strings – all driven by a single .output() method that works the same on every PluckResult, no matter which upstream phase produced it.
The mental model
Every phase before output – connect, navigate, extract, shape, act, sense – produces a PluckResult. The result has the raw fields (text, data, segments, receipt, sensed, etc.), but consumers rarely want raw. They want markdown for their docs site, JSON for their database, SRT for their subtitles, embeddings for their vector store.
The output phase is where that rendering lives. It's the single method on the result, shaped to look the same for every upstream source:
import { pluck } from "@sizls/pluck";
const result = await pluck("https://news.ycombinator.com");
result.output("markdown"); // "# Top | Hacker News\n\n…"
result.output("json"); // "{\n \"items\": [ … ]\n}"
result.output("csv"); // "title,url,score,…"
result.output("text"); // plain text
Output is a phase, not a standalone verb. It runs implicitly on the returned PluckResult via the .output() method; you don't import anything extra.
The 12 formats
| Format | What you get | Typical use |
|---|---|---|
markdown | Clean markdown – title, body, metadata. | Docs sites, GitHub issues, LLM input. |
json | Full PluckResult as pretty-printed JSON. | Database upsert, inspection, logs. |
text | Plain text, no markup. | Email, SMS, terminal. |
html | HTML body with metadata table. | Email, dashboards. |
xml | Structured XML. | Legacy integrations, RSS exporters. |
yaml | YAML dump. | Config files, CI artifacts. |
csv | Rows with header. Uses result.data when present, else best-effort columns from segments / text. | Spreadsheets, BI tools. |
sql | INSERT INTO ... statements. | Database seeds. |
embeddings | Text chunks ready to hand to an embedding model. | Vector stores, RAG. |
srt | SubRip subtitle format from result.segments. | Video pipelines. |
vtt | WebVTT subtitle format from result.segments. | Video pipelines. |
template | Render via result.template(string) – Markdoc-style interpolation against the result. | Custom reports, newsletters. |
The exact union is exported as OutputFormat:
type OutputFormat =
| "json" | "markdown" | "text"
| "srt" | "vtt" | "csv"
| "html" | "xml" | "yaml"
| "sql" | "embeddings" | "template"
| (string & {}); // escape hatch for custom formatters
Every format is implemented as a Formatter – a pure function that takes a PluckResult and returns a string. Built-ins are registered in the formatter registry at instance creation; custom formats are one createPluck({ formatters: [defineOutput({ … })] }) call away (see Custom output formats below).
Presets live on a sibling surface (result.preset(name)) and templates on a third (result.template(source)). See Presets and templates – same engine, different surface below for how they relate.
Presets and templates – same engine, different surface
Presets and templates both render through the same Markdoc-style template engine. The only difference is discoverability:
- Preset – a named template. The name is on a string-typed union (
PresetName), so IDEs autocomplete it and the CLI lists it in--help. The template body lives in the Pluck source tree. - Template – an inline template string you pass at the call site. No registration, no name.
Under the hood, result.preset("blog") is literally result.template(BUILTIN_BLOG_TEMPLATE) – both paths go through createTemplateFormatter(source).render(result).
// Preset – discoverable, typed, reusable.
result.preset("blog");
// Equivalent template – inline, one-off, unnamed.
result.template(`
# {{ metadata.title }}
{{#each segments}}
- {{ text }}
{{/each}}
`);
The six built-in preset names are typed as PresetName:
type PresetName = "blog" | "notes" | "social" | "rag" | "dataset" | "report";
Rule of thumb: reach for a template when the shape is one-off; reach for a preset when the shape is going to be reused enough that it deserves a name. Presets live in the Pluck source tree today; the registration surface for user-defined presets is on the backlog.
Templates in detail
Templates are Markdoc-style strings with {{ interpolation }} and basic loops:
const result = await pluck("https://news.ycombinator.com");
result.template(`
# {{ title }}
{{#each segments}}
- [{{ text }}]({{ meta.url }}) – {{ meta.score }} points
{{/each}}
Pulled at {{ metadata.fetchedAt }}.
`);
The template sees the full PluckResult – access any field by name. Missing fields render as empty strings; undefined lookups don't throw.
Custom output formats – defineOutput
The formatter registry is open. Register your own for any format your team cares about – a Slack Block Kit JSON blob, a ServiceNow payload, your company's internal XML flavour. The typed helper matches the rest of the pipeline's define* family (defineConnector, defineExtractor, defineActor, defineSensor):
import { defineOutput, createPluck } from "@sizls/pluck";
const slackBlocks = defineOutput({
name: "slack-blocks",
format: "slack-blocks",
render(result) {
return JSON.stringify({
blocks: [
{ type: "header", text: { type: "plain_text", text: result.metadata.title } },
{ type: "section", text: { type: "mrkdwn", text: result.text.slice(0, 300) } },
],
});
},
});
const pluck = createPluck({ formatters: [slackBlocks] });
const result = await pluck("https://news.example.com/post");
result.output("slack-blocks"); // string
Custom formatters are prepended to the registry and win over built-ins with the same format key. The format field is the argument you pass to result.output(format) – keep it short, lowercase, kebab-case.
Output in the CLI
The CLI exposes the same surface through --format:
pluck https://news.ycombinator.com --format markdown
pluck https://api.example.com/items --format json | jq .
pluck ./interview.mp3 --format srt -o subs.srt
pluck run workflow.yaml --format preset:rag
Pipes work – every format is a string, and stdout is the default sink unless -o <path> is passed.
Why output is a separate phase
It isn't, strictly – rendering a format is pure (result in, string out), and the pipeline itself finishes at act / sense / extract / etc. Output is a phase in the type system (PluckPhase includes "output") because treating it as one gives the system three properties:
- Formatters are swappable. Registering a custom formatter via
defineOutputis the same shape of API asdefineConnector,defineExtractor,defineActor,defineSensor. - Errors are attributable. If a format throws ("this source has no segments, can't render VTT"), the pipeline error carries
phase: "output"– so traces, monitors, and replay all know where the failure happened. - The
.output()API is uniform. No matter which upstream phase ran, the user-facing method signature is the same. That's a real DX win when the pipeline has six other phases that each produce different result shapes.
The seven phases at a glance
Output is one of seven phases. The table is compact on purpose – if you've ever asked "wait, don't navigate and act both drive Playwright?" or "isn't a preset just a template with a name?", this table answers it.
| # | Phase | Input | Output | One-line role | What makes it distinct |
|---|---|---|---|---|---|
| 1 | Connect | URI | ConnectResult (raw bytes + metadata) | Pull raw bytes from any source. | Matches URI schemes. No interpretation – just "here are the bytes." |
| 2 | Navigate | ConnectResult | NavigateResult (cleaner bytes, same shape) | Prepare content so extract can read it. | Passive. Reads pages, dismisses modals, waits for SPA renders. No side effects on the source. |
| 3 | Extract | NavigateResult | ExtractResult (loose structured data) | Pull structured data out of content. | Five strategies: auto / css / regex / llm / hybrid. Outputs are "loose" – any shape. |
| 4 | Shape | ExtractResult | PluckResult<T> (typed data) | Pin loose data to a Zod schema. | Validates + narrows. Drift detection on schema mismatch. |
| 5 | Act | URI + action + input | PluckResult + signed receipt | Perform a mutation on the source. | Active. Every call signs a receipt, runs policy, honours idempotency. Side effects required. |
| 6 | Sense | NavigateResult | PluckResult + sensed.features | Extract signal features below human perception. | DSP. Runs against audio / video / text / image content, not DOM. |
| 7 | Output | PluckResult | string | Render the result into a consumer-ready format. | Pure. Same result goes in, different strings come out. |
The "wait, isn't this the same as…?" checklist
- Navigate vs. Act. Both have browser-backed modes that load URLs and click things. Navigate uses them to read a page (the SPA-rendering / interact / agent modes return bytes for extract). Act uses them to write (the
browserandbrowser-agentactors produce signed receipts). If there's no mutation and no receipt, it's navigate. If the operation is one you'd want to undo, it's act. See the Navigate page for details. - Preset vs. Template. Presets are named templates. Both render through the same engine. Use a preset when the shape is reusable enough to earn a name (
"blog","rag"); use a template when it's one-off. - Output format vs. Preset. Formats (
markdown,json,srt) are code-based renderers – they inspectresultfields and emit a string according to a spec. Presets are template strings – they substitute{{ field }}placeholders into a pre-baked skeleton. Reach for a format when you need a standard (JSON, CSV, SRT). Reach for a preset when you need a shape of prose (blog post, RAG chunks, executive report).
What's next
- Getting Started – install + first pluck + your first
.output()call. - Reference: Connectors – which URIs feed the pipeline.
- Reference: CLI – every
--formatand preset on the CLI.