- Docs
- Core Concepts
- Shape
Core Concepts
Shape
The fourth phase of the Pluck pipeline. Zod in, contract-validated data out. Drift caught automatically.
The mental model
Connect gives you bytes. Extract gives you text and loosely-typed data. Shape's job is to pin that loose data to the exact schema your database, your API, your agent expects – and to fail loudly when the upstream source changes.
The pipeline looks like this:
Connect → Navigate → Extract → Shape → Output
Shape solves three problems that crash production at 2am:
- "The LLM added a
_reasoningfield and our Supabase upsert failed." Strict mode strips every field that's not in the schema. - "The third-party API renamed
game.home.scoretogame.homeTeam.scoreand we didn't notice for a week." TheonDriftcallback fires as soon as a required field goes missing or a previously-present field disappears. - "We need
total_score = home + awayin the database, not the input."computefunctions add derived fields deterministically, before validation.
Shape is a phase, not a standalone verb. You run it via the shape() function exported from @sizls/pluck. It doesn't fetch anything; it transforms data you already have.
import { shape } from "@sizls/pluck";
import { z } from "zod";
const gameSchema = z.object({
home_score: z.number(),
away_score: z.number(),
total: z.number(),
});
const result = shape(rawGameData, {
schema: gameSchema,
map: {
"game.homeTeam.score": "home_score",
"game.awayTeam.score": "away_score",
},
compute: {
total: (data) => Number(data.home_score) + Number(data.away_score),
},
});
if (result.valid) {
// result.data: { home_score: 3, away_score: 1, total: 4 }
await supabase.from("scores").upsert(result.data);
}
Two modes
Shape runs in one of two modes, decided by whether you pass a map:
Mode 1 – Schema only (validate LLM output)
No mapping. Pluck takes the data you pass in, validates it against the schema, strips anything the schema doesn't know about, and returns the clean result:
import { shape } from "@sizls/pluck";
import { z } from "zod";
const productSchema = z.object({
title: z.string(),
price: z.number(),
});
// LLM returned { title: "Widget", price: 29.99, _reasoning: "..." }
const result = shape(llmOutput, { schema: productSchema });
// result.data: { title: "Widget", price: 29.99 }
// result.stripped: ["_reasoning"]
This is the common path for LLM-based extraction. The LLM spits out whatever it feels like; Shape makes sure only what you asked for reaches the database.
Mode 2 – Schema + field map (deterministic API mapping)
When the source API has a known shape, you can deterministically rename and transform fields before validation – no LLM involved. This is the fastest and cheapest mode:
const result = shape(apiResponse, {
schema: z.object({
id: z.string(),
title: z.string(),
minutes: z.number(),
}),
map: {
"data.id": "id",
"data.attributes.title": "title",
"data.attributes.duration_ms": {
to: "minutes",
transform: (ms) => Math.floor(Number(ms) / 60_000),
},
},
});
Dot-path access works for any depth of nesting. The transform hook receives the raw value and can coerce / normalize before validation.
defineShape() for reusable configs
When a shape is worth committing to disk (or importing across files), defineShape is the typed authoring helper:
import { defineShape, shape } from "@sizls/pluck";
import { z } from "zod";
export const gameShape = defineShape({
schema: z.object({
home_score: z.number(),
away_score: z.number(),
total: z.number(),
}),
map: {
"game.homeTeam.score": "home_score",
"game.awayTeam.score": "away_score",
},
compute: {
total: (d) => Number(d.home_score) + Number(d.away_score),
},
});
// Later:
const result = shape(apiResponse, gameShape);
// result.data narrows to { home_score: number; away_score: number; total: number }
defineShape is generic-forwarded, so the Zod schema's inferred type flows through to result.data without any manual casting.
Typed pluck<T>(uri, { shape }) at the top level
The same inference works one level up – call pluck() with a Zod-backed shape.schema and result.data is typed as z.infer<typeof schema> without a separate shape() step. No as casts, no non-null assertions.
import { pluck } from "@sizls/pluck";
import { z } from "zod";
const Post = z.object({
title: z.string(),
author: z.string(),
publishedAt: z.string(),
});
const { data, shape } = await pluck("https://blog.example.com/posts/42", {
shape: { schema: Post },
});
if (shape?.valid) {
// data is z.infer<typeof Post> – typed end-to-end, no cast
console.log(data.title, data.author);
}
When shape is omitted, data stays Record<string, unknown> – old callers see no change.
Drift detection
The onDrift callback is the thing no other pipeline library ships. It fires in two cases:
- Success-drift – validation passed but strict mode stripped fields. You get a list of keys that were present in the input but NOT in the schema. Useful for spotting a new field a vendor added (maybe a PII column you want to cover) before someone uses it in production.
- Failure-drift – validation failed. Some required field went missing, a type flipped, a new required field appeared upstream. You get the Zod-derived paths and messages.
shape(input, {
schema: productSchema,
onDrift: (stripped, errors) => {
if (errors) {
// Upstream broke – page oncall.
slack.post("#oncall", `Shape failed: ${errors.map((e) => e.path).join(", ")}`);
} else {
// Upstream added new fields – we're stripping them silently.
slack.post("#data-drift", `Stripped: ${stripped.join(", ")}`);
}
},
});
Prefer a stderr warning over wiring a callback? Set warnOnDrift: true:
[Pluck] Shape stripped 2 fields: [_reasoning, _confidence]. Add to schema or set strict: false.
[Pluck] Shape validation failed with 1 error: [game.total]. Schema drift or upstream API change?
This pair – Extract for loose pulls, Shape for strict contracts – is what the community has historically stapled together with Zod + MSW + custom scripts. Pluck makes it one line.
Schema inference from a live API
Writing Zod schemas by hand for a 40-field API payload is tedious. The CLI can do it for you:
pluck shape --from-api https://api.github.com/repos/vercel/next.js -o github-repo.shape.ts
That emits ready-to-commit TypeScript:
import { z } from "zod";
export const githubRepo = z.object({
id: z.number(),
name: z.string(),
full_name: z.string(),
owner: z.object({
login: z.string(),
id: z.number(),
avatar_url: z.string().url(),
}),
created_at: z.string().datetime(),
// …
}).partial();
export type GithubRepo = z.infer<typeof githubRepo>;
Sample multiple live responses for better optionality/nullability detection by calling inferZodSchema({ samples: [resp1, resp2, resp3] }) programmatically – the CLI currently fetches a single response. The generated file is a starting point – commit it, edit it, tighten the constraints.
Social ETL templates
Pluck ships pre-built Zod schemas for common social APIs so the first 90% of the shape work is already done:
import { pluck, shape, spotifyTrack } from "@sizls/pluck";
const raw = await pluck("spotify://track/3n3Ppam7vgaVa1iaRUc9Lp");
const typed = shape(raw.data!, { schema: spotifyTrack });
// typed.data is z.infer<typeof spotifyTrack> – fully typed for upsert
Built-in templates:
spotifyTrack– from the Spotify connectortwitchClip– from the Twitch connectorinstagramPost– Instagram oEmbedtiktokPost– TikTok oEmbedvimeoVideo– Vimeo oEmbedtwitterTweet– Twitter syndication
Every template is just a plain Zod schema – extend it with .extend({ … }), narrow it with .pick({ … }), whatever you need.
The --diff flag
When you're iterating on a shape config, you want to see what it actually does to your data. formatShapeDiff(result) returns a coloured summary:
pluck shape --diff ./game.shape.ts ./api-response.json
Shape diff
kept 3 home_score · away_score · total
renamed 2 game.homeTeam.score → home_score · game.awayTeam.score → away_score
computed 1 total
stripped 2 _llm · _confidence
Perfect for "why did my upsert shrink from 40 fields to 3?" debugging.
ShapeResult
Every call returns the same shape:
interface ShapeResult<T> {
data: T; // Validated, narrowed data
valid: boolean;
stripped?: string[]; // Strict-mode removed keys
errors?: ShapeError[]; // Zod errors on failure
renamed?: { from: string; to: string }[];
computed?: string[];
provenance?: Record<string, FieldProvenance>; // Per-field lineage
}
interface FieldProvenance {
via: "map" | "compute" | "passthrough";
from?: string; // source dot-path for "map", source key for "passthrough"
}
At runtime, data is undefined when valid is false (the TypeScript type stays T to keep the happy path clean – always branch on result.valid before reading).
Per-field provenance
provenance answers "where did this value come from?" for every field that made it into result.data:
const result = shape(
{ game: { homeTeam: { score: 3 }, awayTeam: { score: 1 } } },
{
schema: z.object({
home_score: z.number(),
away_score: z.number(),
total: z.number(),
}),
map: {
"game.homeTeam.score": "home_score",
"game.awayTeam.score": "away_score",
},
compute: { total: (d) => Number(d.home_score) + Number(d.away_score) },
},
);
result.provenance;
// {
// home_score: { via: "map", from: "game.homeTeam.score" },
// away_score: { via: "map", from: "game.awayTeam.score" },
// total: { via: "compute" },
// }
Use it to populate traces, to answer "why is this field 3?" when you're debugging a bad upsert, or to feed telemetry that tracks which input paths your pipeline actually reads. Present only when valid is true.
Full runnable example
The smallest end-to-end shape program – a typed pluck<T>({ shape }) call against a Zod schema, inferred return type, drift detection on the strict-mode stripped keys. Opens in a fresh StackBlitz sandbox.
What's next
- Act – once data is shaped, take action with a signed receipt.
- Recipe: Shape Spotify – a full worked example.
- Sense – the DSP side of the pipeline.