Skip to content

Recipes

Recipe: Shape Spotify

One URL in. One typed Zod-validated row out. Drift caught automatically.


The demo

TypeScript
import { pluck, spotifyTrack, shape } from "@sizls/pluck";

const raw = await pluck("https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp");
const result = shape(raw.data!, { schema: spotifyTrack });

if (result.valid) {
  console.log(result.data);
  // {
  //   sourceType: "spotify",
  //   url: "https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp",
  //   title: "Mr. Brightside",
  //   artist: "The Killers",
  //   thumbnailUrl: "https://i.scdn.co/image/...",
  //   embedHtml: "<iframe …",
  //   // … every field typed as `z.infer<typeof spotifyTrack>`
  // }
}

That's the whole demo. Extract returns loose data. shape validates against the built-in spotifyTrack Zod schema, strips anything not in the schema (no more mystery _llm fields breaking your Supabase upsert), and hands back typed data ready to persist.


Why shape + Spotify?

The Spotify connector uses Spotify's public oEmbed endpoint and falls back to Open Graph tags when the oEmbed payload is thin. That means the extract phase gives you:

  • A loosely-typed object of whatever Spotify happened to return today
  • With fields that can shift over time (Spotify adds / removes / renames / reorders)
  • And occasional _llm metadata fields when the extractor uses LLM hybrid mode

Shape is what turns that into a contract. The spotifyTrack schema is a frozen shape – if Spotify changes their response, your downstream code doesn't silently break; shape's onDrift callback fires and you find out.


The built-in schema

spotifyTrack ships in @sizls/pluck:

TypeScript
import { z } from "zod";

export const spotifyTrack = z.object({
  sourceType: z.literal("spotify"),
  url: z.string().optional(),
  title: z.string().optional(),
  artist: z.string().optional(),
  album: z.string().optional(),
  durationMs: z.number().optional(),
  thumbnailUrl: z.string().optional(),
  providerUrl: z.string().optional(),
  thumbnailWidth: z.number().optional(),
  thumbnailHeight: z.number().optional(),
  embedHtml: z.string().optional(),
  width: z.number().optional(),
  height: z.number().optional(),
});

export type SpotifyTrack = z.infer<typeof spotifyTrack>;

Every field is optional because Spotify's response is opportunistic – a track might have an album or not depending on whether it's a single, an EP, or an album track. sourceType: "spotify" is the one literal – it's Pluck's tag for discriminating downstream.


Pipeline integration

Skip the manual shape call – pass the schema straight to pluck() and the pipeline runs shape for you:

TypeScript
import { pluck, spotifyTrack } from "@sizls/pluck";

const track = await pluck("https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp", {
  shape: { schema: spotifyTrack, onDrift: handleDrift },
});

// track.shape.data is typed, validated, stripped.
// track.shape.valid is true | false.
// track.shape.stripped lists any extra fields Spotify added.

The shape option on pluck() is honored after extract. It runs in-process, no extra network.


Custom field mapping + compute

For sources where the extracted shape doesn't perfectly match what you want to store, add a map and compute:

TypeScript
import { defineShape, shape } from "@sizls/pluck";
import { z } from "zod";

const spotifyRow = defineShape({
  schema: z.object({
    track_id: z.string(),
    title: z.string(),
    artist: z.string(),
    album: z.string().nullable(),
    duration_seconds: z.number(),
    thumbnail_url: z.string().url().nullable(),
    pulled_at: z.string().datetime(),
  }),
  map: {
    "url": {
      to: "track_id",
      transform: (u) => String(u).split("/track/")[1] ?? "",
    },
    "title": "title",
    "artist": "artist",
    "album": {
      to: "album",
      transform: (a) => (a === undefined ? null : String(a)),
    },
    "durationMs": {
      to: "duration_seconds",
      transform: (ms) => Math.round(Number(ms) / 1000),
    },
    "thumbnailUrl": "thumbnail_url",
  },
  compute: {
    pulled_at: () => new Date().toISOString(),
  },
});

const result = shape(raw.data!, spotifyRow);
// result.data is the database-ready row.

defineShape<T> preserves the Zod inference, so result.data narrows to the exact schema without manual casting.


Upsert to Postgres

Point Pluck at your database through the act phase:

TypeScript
import { pluck, act, spotifyTrack } from "@sizls/pluck";

const track = await pluck("https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp", {
  shape: { schema: spotifyTrack },
});

if (track.shape.valid) {
  await act("postgres://localhost/music", {
    action: "upsert",
    input: {
      table: "spotify_tracks",
      rows: [track.shape.data],
      conflict: ["url"],
    },
  });
  // Signed receipt returned. Idempotent by default.
}

Pluck's act phase handles the idempotency + audit log – running the same pluck twice doesn't duplicate the row. See Concepts: Act for the full story.


Batch + drift alerting

Shape shines when you run it across many URLs:

TypeScript
import { pluck, spotifyTrack } from "@sizls/pluck";

const urls = [
  "https://open.spotify.com/track/3n3Ppam7vgaVa1iaRUc9Lp",
  "https://open.spotify.com/track/0VjIjW4GlUZAMYd2vXMi3b",
  "https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh",
  // …
];

for await (const batchResult of pluck.batch(urls, {
  concurrency: 5,
  shape: {
    schema: spotifyTrack,
    onDrift: (stripped, errors) => {
      if (errors) {
        slack.post("#data-alerts", `Spotify schema failed: ${errors.map((e) => e.path).join(", ")}`);
      } else {
        slack.post("#data-drift", `Spotify added: ${stripped.join(", ")}`);
      }
    },
  },
})) {
  if (batchResult.result?.shape?.valid) {
    await upsertTrack(batchResult.result.shape.data);
  }
}

onDrift fires in two cases:

  • Success-drift – validation passed but shape stripped fields. Spotify added a new field (e.g. explicit or popularity the extractor picked up) and your schema didn't cover it yet. Time to decide: do you want that field?
  • Failure-drift – validation failed. Spotify removed or renamed a required field. Your pipeline is broken; oncall gets paged.

Sibling templates

The Spotify schema is one of six built-in social ETL templates – all imported from @sizls/pluck root:

TypeScript
import {
  spotifyTrack,
  twitchClip,
  instagramPost,
  tiktokPost,
  vimeoVideo,
  twitterTweet,
} from "@sizls/pluck";

Each one ships the same opportunistic-field-everything-optional pattern with a sourceType literal tag. Use them as-is for quick wins, or .extend() / .pick() when you need custom column shapes.


What's next

Edit this page on GitHub
Previous
Snitch Privacy

Ready to build?

Install Pluck and follow the Quick Start guide to wire MCP-first data pipelines into your agents and fleets in minutes.

Get started →