The fourth phase of the Pluck pipeline. Zod in, contract-validated data out. Drift caught automatically.

The mental model

Connect gives you bytes. Extract gives you text and loosely-typed data. Shape's job is to pin that loose data to the exact schema your database, your API, your agent expects – and to fail loudly when the upstream source changes.

The pipeline looks like this:

Connect → Navigate → Extract → Shape → Output

Shape solves three problems that crash production at 2am:

"The LLM added a _reasoning field and our Supabase upsert failed." Strict mode strips every field that's not in the schema.
"The third-party API renamed game.home.score to game.homeTeam.score and we didn't notice for a week." The onDrift callback fires as soon as a required field goes missing or a previously-present field disappears.
"We need total_score = home + away in the database, not the input." compute functions add derived fields deterministically, before validation.

Shape is a phase, not a standalone verb. You run it via the shape() function exported from @sizls/pluck. It doesn't fetch anything; it transforms data you already have.

TypeScript

import { shape } from "@sizls/pluck";
import { z } from "zod";

const gameSchema = z.object({
  home_score: z.number(),
  away_score: z.number(),
  total: z.number(),
});

const result = shape(rawGameData, {
  schema: gameSchema,
  map: {
    "game.homeTeam.score": "home_score",
    "game.awayTeam.score": "away_score",
  },
  compute: {
    total: (data) => Number(data.home_score) + Number(data.away_score),
  },
});

if (result.valid) {
  // result.data: { home_score: 3, away_score: 1, total: 4 }
  await supabase.from("scores").upsert(result.data);
}

Two modes

Shape runs in one of two modes, decided by whether you pass a map:

Mode 1 – Schema only (validate LLM output)

No mapping. Pluck takes the data you pass in, validates it against the schema, strips anything the schema doesn't know about, and returns the clean result:

TypeScript

import { shape } from "@sizls/pluck";
import { z } from "zod";

const productSchema = z.object({
  title: z.string(),
  price: z.number(),
});

// LLM returned { title: "Widget", price: 29.99, _reasoning: "..." }
const result = shape(llmOutput, { schema: productSchema });
// result.data: { title: "Widget", price: 29.99 }
// result.stripped: ["_reasoning"]

This is the common path for LLM-based extraction. The LLM spits out whatever it feels like; Shape makes sure only what you asked for reaches the database.

Mode 2 – Schema + field map (deterministic API mapping)

When the source API has a known shape, you can deterministically rename and transform fields before validation – no LLM involved. This is the fastest and cheapest mode:

TypeScript

const result = shape(apiResponse, {
  schema: z.object({
    id: z.string(),
    title: z.string(),
    minutes: z.number(),
  }),
  map: {
    "data.id": "id",
    "data.attributes.title": "title",
    "data.attributes.duration_ms": {
      to: "minutes",
      transform: (ms) => Math.floor(Number(ms) / 60_000),
    },
  },
});

Dot-path access works for any depth of nesting. The transform hook receives the raw value and can coerce / normalize before validation.

`defineShape()` for reusable configs

When a shape is worth committing to disk (or importing across files), defineShape is the typed authoring helper:

TypeScript

import { defineShape, shape } from "@sizls/pluck";
import { z } from "zod";

export const gameShape = defineShape({
  schema: z.object({
    home_score: z.number(),
    away_score: z.number(),
    total: z.number(),
  }),
  map: {
    "game.homeTeam.score": "home_score",
    "game.awayTeam.score": "away_score",
  },
  compute: {
    total: (d) => Number(d.home_score) + Number(d.away_score),
  },
});

// Later:
const result = shape(apiResponse, gameShape);
// result.data narrows to { home_score: number; away_score: number; total: number }

defineShape is generic-forwarded, so the Zod schema's inferred type flows through to result.data without any manual casting.

Typed `pluck<T>(uri, { shape })` at the top level

The same inference works one level up – call pluck() with a Zod-backed shape.schema and result.data is typed as z.infer<typeof schema> without a separate shape() step. No as casts, no non-null assertions.

TypeScript

import { pluck } from "@sizls/pluck";
import { z } from "zod";

const Post = z.object({
  title: z.string(),
  author: z.string(),
  publishedAt: z.string(),
});

const { data, shape } = await pluck("https://blog.example.com/posts/42", {
  shape: { schema: Post },
});

if (shape?.valid) {
  // data is z.infer<typeof Post> – typed end-to-end, no cast
  console.log(data.title, data.author);
}

When shape is omitted, data stays Record<string, unknown> – old callers see no change.

Drift detection

The onDrift callback is the thing no other pipeline library ships. It fires in two cases:

Success-drift – validation passed but strict mode stripped fields. You get a list of keys that were present in the input but NOT in the schema. Useful for spotting a new field a vendor added (maybe a PII column you want to cover) before someone uses it in production.
Failure-drift – validation failed. Some required field went missing, a type flipped, a new required field appeared upstream. You get the Zod-derived paths and messages.

TypeScript

shape(input, {
  schema: productSchema,
  onDrift: (stripped, errors) => {
    if (errors) {
      // Upstream broke – page oncall.
      slack.post("#oncall", `Shape failed: ${errors.map((e) => e.path).join(", ")}`);
    } else {
      // Upstream added new fields – we're stripping them silently.
      slack.post("#data-drift", `Stripped: ${stripped.join(", ")}`);
    }
  },
});

Prefer a stderr warning over wiring a callback? Set warnOnDrift: true:

[Pluck] Shape stripped 2 fields: [_reasoning, _confidence]. Add to schema or set strict: false.
[Pluck] Shape validation failed with 1 error: [game.total]. Schema drift or upstream API change?

This pair – Extract for loose pulls, Shape for strict contracts – is what the community has historically stapled together with Zod + MSW + custom scripts. Pluck makes it one line.

Schema inference from a live API

Writing Zod schemas by hand for a 40-field API payload is tedious. The CLI can do it for you:

Shell

pluck shape --from-api https://api.github.com/repos/vercel/next.js -o github-repo.shape.ts

That emits ready-to-commit TypeScript:

TypeScript

import { z } from "zod";

export const githubRepo = z.object({
  id: z.number(),
  name: z.string(),
  full_name: z.string(),
  owner: z.object({
    login: z.string(),
    id: z.number(),
    avatar_url: z.string().url(),
  }),
  created_at: z.string().datetime(),
  // …
}).partial();

export type GithubRepo = z.infer<typeof githubRepo>;

Sample multiple live responses for better optionality/nullability detection by calling inferZodSchema({ samples: [resp1, resp2, resp3] }) programmatically – the CLI currently fetches a single response. The generated file is a starting point – commit it, edit it, tighten the constraints.

Pluck ships pre-built Zod schemas for common social APIs so the first 90% of the shape work is already done:

TypeScript

import { pluck, shape, spotifyTrack } from "@sizls/pluck";

const raw = await pluck("spotify://track/3n3Ppam7vgaVa1iaRUc9Lp");
const typed = shape(raw.data!, { schema: spotifyTrack });
// typed.data is z.infer<typeof spotifyTrack> – fully typed for upsert

Built-in templates:

spotifyTrack – from the Spotify connector
twitchClip – from the Twitch connector
instagramPost – Instagram oEmbed
tiktokPost – TikTok oEmbed
vimeoVideo – Vimeo oEmbed
twitterTweet – Twitter syndication

Every template is just a plain Zod schema – extend it with .extend({ … }), narrow it with .pick({ … }), whatever you need.

The `--diff` flag

When you're iterating on a shape config, you want to see what it actually does to your data. formatShapeDiff(result) returns a coloured summary:

Shell

pluck shape --diff ./game.shape.ts ./api-response.json

 Shape diff
   kept     3  home_score · away_score · total
   renamed  2  game.homeTeam.score → home_score · game.awayTeam.score → away_score
   computed 1  total
   stripped 2  _llm · _confidence

Perfect for "why did my upsert shrink from 40 fields to 3?" debugging.

`ShapeResult`

Every call returns the same shape:

TypeScript

interface ShapeResult<T> {
  data: T;                                 // Validated, narrowed data
  valid: boolean;
  stripped?: string[];                     // Strict-mode removed keys
  errors?: ShapeError[];                   // Zod errors on failure
  renamed?: { from: string; to: string }[];
  computed?: string[];
  provenance?: Record<string, FieldProvenance>; // Per-field lineage
}

interface FieldProvenance {
  via: "map" | "compute" | "passthrough";
  from?: string; // source dot-path for "map", source key for "passthrough"
}

At runtime, data is undefined when valid is false (the TypeScript type stays T to keep the happy path clean – always branch on result.valid before reading).

Per-field provenance

provenance answers "where did this value come from?" for every field that made it into result.data:

TypeScript

const result = shape(
  { game: { homeTeam: { score: 3 }, awayTeam: { score: 1 } } },
  {
    schema: z.object({
      home_score: z.number(),
      away_score: z.number(),
      total: z.number(),
    }),
    map: {
      "game.homeTeam.score": "home_score",
      "game.awayTeam.score": "away_score",
    },
    compute: { total: (d) => Number(d.home_score) + Number(d.away_score) },
  },
);

result.provenance;
// {
//   home_score: { via: "map", from: "game.homeTeam.score" },
//   away_score: { via: "map", from: "game.awayTeam.score" },
//   total:      { via: "compute" },
// }

Use it to populate traces, to answer "why is this field 3?" when you're debugging a bad upsert, or to feed telemetry that tracks which input paths your pipeline actually reads. Present only when valid is true.

Full runnable example

The smallest end-to-end shape program – a typed pluck<T>({ shape }) call against a Zod schema, inferred return type, drift detection on the strict-mode stripped keys. Opens in a fresh StackBlitz sandbox.

What's next

Act – once data is shaped, take action with a signed receipt.
Recipe: Shape Spotify – a full worked example.
Sense – the DSP side of the pipeline.

The mental model

Two modes

Mode 1 – Schema only (validate LLM output)

Mode 2 – Schema + field map (deterministic API mapping)

defineShape() for reusable configs

Typed pluck<T>(uri, { shape }) at the top level

Drift detection

Schema inference from a live API

Social ETL templates

The --diff flag

ShapeResult

Per-field provenance

Full runnable example

What's next

`defineShape()` for reusable configs

Typed `pluck<T>(uri, { shape })` at the top level

The `--diff` flag

`ShapeResult`