- Docs
- Recipe: Shape Spotify
Recipes
Recipe: Shape Spotify
One URL in. One typed Zod-validated row out. Drift caught automatically.
The demo
import { pluck, spotifyTrack, shape } from "@sizls/pluck";
const raw = await pluck("https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp");
const result = shape(raw.data!, { schema: spotifyTrack });
if (result.valid) {
console.log(result.data);
// {
// sourceType: "spotify",
// url: "https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp",
// title: "Mr. Brightside",
// artist: "The Killers",
// thumbnailUrl: "https://i.scdn.co/image/...",
// embedHtml: "<iframe …",
// // … every field typed as `z.infer<typeof spotifyTrack>`
// }
}
That's the whole demo. Extract returns loose data. shape validates against the built-in spotifyTrack Zod schema, strips anything not in the schema (no more mystery _llm fields breaking your Supabase upsert), and hands back typed data ready to persist.
Why shape + Spotify?
The Spotify connector uses Spotify's public oEmbed endpoint and falls back to Open Graph tags when the oEmbed payload is thin. That means the extract phase gives you:
- A loosely-typed object of whatever Spotify happened to return today
- With fields that can shift over time (Spotify adds / removes / renames / reorders)
- And occasional
_llmmetadata fields when the extractor uses LLM hybrid mode
Shape is what turns that into a contract. The spotifyTrack schema is a frozen shape – if Spotify changes their response, your downstream code doesn't silently break; shape's onDrift callback fires and you find out.
The built-in schema
spotifyTrack ships in @sizls/pluck:
import { z } from "zod";
export const spotifyTrack = z.object({
sourceType: z.literal("spotify"),
url: z.string().optional(),
title: z.string().optional(),
artist: z.string().optional(),
album: z.string().optional(),
durationMs: z.number().optional(),
thumbnailUrl: z.string().optional(),
providerUrl: z.string().optional(),
thumbnailWidth: z.number().optional(),
thumbnailHeight: z.number().optional(),
embedHtml: z.string().optional(),
width: z.number().optional(),
height: z.number().optional(),
});
export type SpotifyTrack = z.infer<typeof spotifyTrack>;
Every field is optional because Spotify's response is opportunistic – a track might have an album or not depending on whether it's a single, an EP, or an album track. sourceType: "spotify" is the one literal – it's Pluck's tag for discriminating downstream.
Pipeline integration
Skip the manual shape call – pass the schema straight to pluck() and the pipeline runs shape for you:
import { pluck, spotifyTrack } from "@sizls/pluck";
const track = await pluck("https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp", {
shape: { schema: spotifyTrack, onDrift: handleDrift },
});
// track.shape.data is typed, validated, stripped.
// track.shape.valid is true | false.
// track.shape.stripped lists any extra fields Spotify added.
The shape option on pluck() is honored after extract. It runs in-process, no extra network.
Custom field mapping + compute
For sources where the extracted shape doesn't perfectly match what you want to store, add a map and compute:
import { defineShape, shape } from "@sizls/pluck";
import { z } from "zod";
const spotifyRow = defineShape({
schema: z.object({
track_id: z.string(),
title: z.string(),
artist: z.string(),
album: z.string().nullable(),
duration_seconds: z.number(),
thumbnail_url: z.string().url().nullable(),
pulled_at: z.string().datetime(),
}),
map: {
"url": {
to: "track_id",
transform: (u) => String(u).split("/track/")[1] ?? "",
},
"title": "title",
"artist": "artist",
"album": {
to: "album",
transform: (a) => (a === undefined ? null : String(a)),
},
"durationMs": {
to: "duration_seconds",
transform: (ms) => Math.round(Number(ms) / 1000),
},
"thumbnailUrl": "thumbnail_url",
},
compute: {
pulled_at: () => new Date().toISOString(),
},
});
const result = shape(raw.data!, spotifyRow);
// result.data is the database-ready row.
defineShape<T> preserves the Zod inference, so result.data narrows to the exact schema without manual casting.
Upsert to Postgres
Point Pluck at your database through the act phase:
import { pluck, act, spotifyTrack } from "@sizls/pluck";
const track = await pluck("https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp", {
shape: { schema: spotifyTrack },
});
if (track.shape.valid) {
await act("postgres://localhost/music", {
action: "upsert",
input: {
table: "spotify_tracks",
rows: [track.shape.data],
conflict: ["url"],
},
});
// Signed receipt returned. Idempotent by default.
}
Pluck's act phase handles the idempotency + audit log – running the same pluck twice doesn't duplicate the row. See Concepts: Act for the full story.
Batch + drift alerting
Shape shines when you run it across many URLs:
import { pluck, spotifyTrack } from "@sizls/pluck";
const urls = [
"https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp",
"https://open.spotify.com/track/0VjIjW4GlUZAMYd2vXMi3b",
"https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh",
// …
];
for await (const batchResult of pluck.batch(urls, {
concurrency: 5,
shape: {
schema: spotifyTrack,
onDrift: (stripped, errors) => {
if (errors) {
slack.post("#data-alerts", `Spotify schema failed: ${errors.map((e) => e.path).join(", ")}`);
} else {
slack.post("#data-drift", `Spotify added: ${stripped.join(", ")}`);
}
},
},
})) {
if (batchResult.result?.shape?.valid) {
await upsertTrack(batchResult.result.shape.data);
}
}
onDrift fires in two cases:
- Success-drift – validation passed but shape stripped fields. Spotify added a new field (e.g.
explicitorpopularitythe extractor picked up) and your schema didn't cover it yet. Time to decide: do you want that field? - Failure-drift – validation failed. Spotify removed or renamed a required field. Your pipeline is broken; oncall gets paged.
Sibling templates
The Spotify schema is one of six built-in social ETL templates – all imported from @sizls/pluck root:
import {
spotifyTrack,
twitchClip,
instagramPost,
tiktokPost,
vimeoVideo,
twitterTweet,
} from "@sizls/pluck";
Each one ships the same opportunistic-field-everything-optional pattern with a sourceType literal tag. Use them as-is for quick wins, or .extend() / .pick() when you need custom column shapes.
What's next
- Concepts: Shape – the shape phase in depth, including provenance + drift detection + formatShapeDiff.
- Concepts: Connect – the Spotify connector's implementation notes.
- Reference: Connectors – every built-in social connector.
- Recipe: Snitch Privacy – compose extract + shape for audit artifacts.