Skip to content

Reference

Connectors Reference

All 30 built-in connectors, the URI schemes each claims, whether the connector streams or returns one-shot, and the optional npm peers each connector loads on demand.


How to read this page

Every URI passed to pluck(uri) is matched against every connector's match() function in a deterministic order. The first match wins. Specific connectors register before generic ones so reddit://r/typescript routes to Reddit and not the HTTP catch-all.

  • One-shot connectors return a ConnectResult with a full body.
  • Streaming connectors return an AsyncIterable<StreamChunk> you can iterate until the source ends or you abort.
  • Peer dep lists the optional npm package loaded via loadOptionalPeer on first use. Pluck's bundle stays lean – connectors with heavy peers only pull them when you actually hit their URIs.

Every connector also honours the shared safety guarantees from the Connect concept page: SSRF validation on outbound hosts, bounded reads (MAX_BODY_BYTES, MAX_LINE_BYTES, MAX_SESSION_BYTES), fleet-expansion caps, and AbortSignal propagation.


Social APIs

Platform-specific connectors that understand each service's URL patterns and public API quirks.

#ConnectorURI schemes matchedAuthNotes
1twittertwitter://, twitter.com/*, x.com/*Token (syndication)Uses Twitter's public syndication endpoint; no paid API key.
2vimeovimeo.com/*NoneoEmbed + metadata scrape.
3youtubeyoutube.com/*, youtu.be/*NoneVideo metadata + transcript (when public).
4instagraminstagram.com/p/*, instagram.com/reel/*NoneoEmbed.
5tiktoktiktok.com/*NoneoEmbed.
6spotifyspotify://, open.spotify.com/*NoneoEmbed + public metadata.
7twitchtwitch.tv/*, clips.twitch.tv/*NoneoEmbed.
8redditreddit://r/<name>, reddit.com/r/*None.json endpoint; comment tree normalised.
9hackernewshn://, hn://item/<id>, news.ycombinator.com/*NoneFirebase API; fleet-friendly hn://top?limit=N.
10telegramtelegram://, t.me/*Bot token (env)Public channel messages.

File storage + cloud

Storage-backed connectors with their own URL structures.

#ConnectorURI schemes matchedAuthNotes
11s3s3://bucket/keyAWS env vars or IAMUses the AWS SDK signing path.
12gdrivegdrive://, drive.google.com/file/d/*OAuth 2.0Shared Drives, shortcut following, export MIME override.
13dropboxdropbox:///path, dropbox.com/s/*DROPBOX_ACCESS_TOKEN envShared links + password-protected support.
14ftpftp://, sftp://User/pass or SSH keyPeer: basic-ftp.

Databases

Query-as-URI connectors. Options on the URI (?limit=..., ?since=...) become query params.

#ConnectorURI schemes matchedAuthPeer dep
15postgrespostgres://, postgresql://Connection stringpg (module-scoped pool)
16mysqlmysql://, mariadb://Connection stringmysql2
17sqlitesqlite:///path.dbFile pathbetter-sqlite3
18mongodbmongodb://, mongodb+srv://Connection stringmongodb
19redisredis://, rediss://Connection stringioredis

Pool keys include user + host + port + auth fingerprint, so swapping credentials opens a fresh pool. Destroying the PluckInstance closes every pool cleanly.


Streaming + network

Long-lived connections that yield chunks.

#ConnectorURI schemes matchedTypeAuthPeer dep
20kafkakafka://broker/topicStreamingSASL / mTLS / MSK IAMkafkajs (+ aws-msk-iam-sasl-signer-js for IAM)
21sshssh://user@host/pathStreamingSSH key / agentssh2
22websocketws://, wss://StreamingSubprotocolws
23imapimap://, imaps://One-shotURI userinfoimapflow
24mqttmqtt://, mqtts://One-shot + Streaming (extract: { streaming: true })URI userinfomqtt
25grpcgrpc://, grpcs://One-shot (unary)TLS / mTLS@grpc/grpc-js + @grpc/proto-loader (+ grpc-js-reflection-client when ?proto= omitted)

SSH streaming powers pluck tail (multiplexed fleet tail -F), pluck driftwatch, and pluck query. Fleet expansion (ssh://web-{01..40}) is handled at the URI level before the SSH connector opens any sockets.

MQTT has both modes: the default pluck("mqtt://broker/topic") returns a one-shot snapshot (drains up to limit messages or waitMs elapses); pass extract: { streaming: true } to switch to a long-running NDJSON stream. The streaming path drops + heartbeats with __pluckDropped when the consumer can't keep up.

gRPC server reflection: when the ?proto= query parameter is omitted, the connector queries the server's grpc.reflection.v1alpha.ServerReflection service to discover the schema at runtime. Servers that don't expose reflection surface a typed REFLECTION_UNAVAILABLE error with an actionable hint.

IMAP / MQTT / Kafka all have write companions. See the Actors reference for the publish / mark-read / move / etc surfaces.

Kafka stream() consumer-lag emit. During stream() the connector ticks the admin API every x-kafka-lag-interval-ms (default 10000, 0 disables) and emits a control chunk with meta.kind: "kafka:lag" carrying { partition, lag, high, consumed } per partition. Lag is the high-water-mark minus the last consumed offset. Admin failures emit a sibling kafka:lag-error chunk; the message stream itself keeps delivering.

Kafka:lag chunk schema (v0.35-R2+):

TypeScript
{ data: "", meta: {
    kind: "kafka:lag",
    topic: "<topic>",
    partitions: [{ partition: 0, lag: "<bigint-string>",
                   high: "<bigint-string>", consumed: "<bigint-string>" | null }],
    emittedAt: 1717172800000
} }

The lag/high/consumed fields are stringified BigInts – Kafka offsets routinely exceed 2^53, so casting via Number(x) loses precision. Use BigInt(meta.partitions[i].lag) downstream. Per-tick timeout (3× the interval, max 15s) ensures a wedged admin call doesn't stall the stream; on timeout the connector abandons the kafkajs admin instance and rebuilds on the next tick.

SSH pooled exec – runPooledSshCommand library API (v0.35-R2+).

TypeScript
import { runPooledSshCommand, setSshPoolMaxSize } from "@sizls/pluck";

setSshPoolMaxSize(256);  // raise the default 32 cap for fan-out runs
const { stdout, stderr, exitCode, signal } = await runPooledSshCommand(
  "ssh://user@host:22",
  "uname -a",
  { timeout: 30_000, maxBodyBytes: 16 * 1024 * 1024, signal: ac.signal },
);

Routes a one-shot exec through the same pool the SFTP-read path uses. SSRF guard via validateHost; bastion via headers["x-ssh-bastion"]; signal aborts AND poisons the pool entry so a leaked channel doesn't get reused. Return shape: { stdout, stderr, exitCode: number | null, signal: string | null } – when the remote was killed by a signal, exitCode is null and signal carries the name (e.g. "KILL"). Callers checking exitCode === 0 ? success : failure MUST also check signal === null. maxBodyBytes (default 16 MB) caps stdout + stderr accumulation; overflow rejects with BODY_TOO_LARGE. Marked @experimental for v0.35; the surface may evolve before v1.0.


Feeds + syndication

#ConnectorURI schemes matchedAuthNotes
23rssrss://, atom://, feed://, *.rss, *.atomNoneCharset detection from BOM + XML decl + Content-Type. Podcast namespaces (itunes:*, content:encoded). Atom + RSS 2.0 dual path.

Local files + media

Fixed-format consumers – the extract phase picks the right extractor by content type, and these connectors just read bytes.

#ConnectorURI schemes matchedAuthNotes
24pdf*.pdf paths / application/pdf URIsNoneBuilt-in PDF stream scanner (install unpdf for richer extraction).
25audio*.wav, *.mp3, *.m4a, *.ogg, *.flacNoneRoutes to the transcribe + sense extractors.
26image*.png, *.jpg, *.jpeg, *.webp, *.gifNoneRoutes to the OCR extractor.
27filefile://absolute/pathNonePath-traversal guarded; symlinks resolved; max-size capped.

Generic HTTP

#ConnectorURI schemes matchedAuthNotes
28restrest://Authorization headerTyped REST helper – infers method from URI path + options.
29graphqlgraphql://, */graphql pathsAuthorization headerParses query / mutation → HTTP verb.
30httphttp://, https://Any headerCatch-all for everything the specific connectors don't claim. Last in priority order.

The HTTP connector re-validates the resolved URL after redirect so a 302 to 169.254.169.254 cannot smuggle past the initial SSRF guard.


Introspection

Every live PluckInstance exposes the registry as a read/register surface:

TypeScript
import { createPluck } from "@sizls/pluck";

const pluck = createPluck();

pluck.connectors.list();
// ["twitter", "vimeo", "youtube", "s3", ..., "http"]

pluck.connectors.find("reddit://r/typescript"); // "reddit"
pluck.connectors.find("xyz://nowhere");         // undefined

Registering your own connector prepends it – see Concepts: Connect → Custom connectors for the full pattern plus the defineConnector() helper.


Streaming connectors – iteration

Three connectors today implement StreamingConnector. Iterate them with for await:

TypeScript
import { createWebSocketConnector } from "@sizls/pluck";

const conn = createWebSocketConnector();
const stream = await conn.stream(
  "wss://stream.binance.com:9443/ws/btcusdt@trade",
  {
    subprotocols: ["binance"],
    heartbeatIntervalMs: 30_000,
    reconnect: { maxAttempts: 5, minDelayMs: 1000, maxDelayMs: 60_000 },
  },
);

for await (const chunk of stream) {
  if (chunk.data) process.stdout.write(chunk.data);
  if (chunk.error) console.error(chunk.error.message); // non-fatal
  if (chunk.end) break;
}

Every StreamChunk is either data, a non-fatal error, a terminal end: true, or carries meta (per-chunk metadata – line number, timestamp, source host for multiplexed fleet streams).


Public vs authenticated access

Not every connector reaches the full surface of the service it points at. Three tiers exist today:

Public-only today. Instagram, Twitter, TikTok, Spotify read public content through oEmbed and syndication endpoints. Private timelines, DMs, gated feeds, and liked-post archives all require authentication Pluck does not yet ship.

Env-var auth today. Google Drive (GOOGLE_DRIVE_TOKEN) and Dropbox (DROPBOX_TOKEN) accept a single environment variable holding an access token. Mint the token through the provider's existing UI, set the env var, and every matching URI uses it.

Coming in v0.5. A full OAuth2 flow via pluck oauth login <service> unlocks private Instagram, private Twitter/X, authenticated LinkedIn, authenticated Reddit, authenticated Slack, and the rest of the gated surfaces. Tokens land in an encrypted on-disk store; connectors pick them up transparently.


What's next

Edit this page on GitHub
Previous
Autonomy-Ledger

Ready to build?

Install Pluck and follow the Quick Start guide to wire MCP-first data pipelines into your agents and fleets in minutes.

Get started →