- Docs
- Connectors Reference
Reference
Connectors Reference
All 30 built-in connectors, the URI schemes each claims, whether the connector streams or returns one-shot, and the optional npm peers each connector loads on demand.
How to read this page
Every URI passed to pluck(uri) is matched against every connector's match() function in a deterministic order. The first match wins. Specific connectors register before generic ones so reddit://r/typescript routes to Reddit and not the HTTP catch-all.
- One-shot connectors return a
ConnectResultwith a full body. - Streaming connectors return an
AsyncIterable<StreamChunk>you can iterate until the source ends or you abort. - Peer dep lists the optional npm package loaded via
loadOptionalPeeron first use. Pluck's bundle stays lean – connectors with heavy peers only pull them when you actually hit their URIs.
Every connector also honours the shared safety guarantees from the Connect concept page: SSRF validation on outbound hosts, bounded reads (MAX_BODY_BYTES, MAX_LINE_BYTES, MAX_SESSION_BYTES), fleet-expansion caps, and AbortSignal propagation.
Social APIs
Platform-specific connectors that understand each service's URL patterns and public API quirks.
| # | Connector | URI schemes matched | Auth | Notes |
|---|---|---|---|---|
| 1 | twitter | twitter://, twitter.com/*, x.com/* | Token (syndication) | Uses Twitter's public syndication endpoint; no paid API key. |
| 2 | vimeo | vimeo.com/* | None | oEmbed + metadata scrape. |
| 3 | youtube | youtube.com/*, youtu.be/* | None | Video metadata + transcript (when public). |
| 4 | instagram | instagram.com/p/*, instagram.com/reel/* | None | oEmbed. |
| 5 | tiktok | tiktok.com/* | None | oEmbed. |
| 6 | spotify | spotify://, open.spotify.com/* | None | oEmbed + public metadata. |
| 7 | twitch | twitch.tv/*, clips.twitch.tv/* | None | oEmbed. |
| 8 | reddit | reddit://r/<name>, reddit.com/r/* | None | .json endpoint; comment tree normalised. |
| 9 | hackernews | hn://, hn://item/<id>, news.ycombinator.com/* | None | Firebase API; fleet-friendly hn://top?limit=N. |
| 10 | telegram | telegram://, t.me/* | Bot token (env) | Public channel messages. |
File storage + cloud
Storage-backed connectors with their own URL structures.
| # | Connector | URI schemes matched | Auth | Notes |
|---|---|---|---|---|
| 11 | s3 | s3://bucket/key | AWS env vars or IAM | Uses the AWS SDK signing path. |
| 12 | gdrive | gdrive://, drive.google.com/file/d/* | OAuth 2.0 | Shared Drives, shortcut following, export MIME override. |
| 13 | dropbox | dropbox:///path, dropbox.com/s/* | DROPBOX_ACCESS_TOKEN env | Shared links + password-protected support. |
| 14 | ftp | ftp://, sftp:// | User/pass or SSH key | Peer: basic-ftp. |
Databases
Query-as-URI connectors. Options on the URI (?limit=..., ?since=...) become query params.
| # | Connector | URI schemes matched | Auth | Peer dep |
|---|---|---|---|---|
| 15 | postgres | postgres://, postgresql:// | Connection string | pg (module-scoped pool) |
| 16 | mysql | mysql://, mariadb:// | Connection string | mysql2 |
| 17 | sqlite | sqlite:///path.db | File path | better-sqlite3 |
| 18 | mongodb | mongodb://, mongodb+srv:// | Connection string | mongodb |
| 19 | redis | redis://, rediss:// | Connection string | ioredis |
Pool keys include user + host + port + auth fingerprint, so swapping credentials opens a fresh pool. Destroying the PluckInstance closes every pool cleanly.
Streaming + network
Long-lived connections that yield chunks.
| # | Connector | URI schemes matched | Type | Auth | Peer dep |
|---|---|---|---|---|---|
| 20 | kafka | kafka://broker/topic | Streaming | SASL / mTLS / MSK IAM | kafkajs (+ aws-msk-iam-sasl-signer-js for IAM) |
| 21 | ssh | ssh://user@host/path | Streaming | SSH key / agent | ssh2 |
| 22 | websocket | ws://, wss:// | Streaming | Subprotocol | ws |
| 23 | imap | imap://, imaps:// | One-shot | URI userinfo | imapflow |
| 24 | mqtt | mqtt://, mqtts:// | One-shot + Streaming (extract: { streaming: true }) | URI userinfo | mqtt |
| 25 | grpc | grpc://, grpcs:// | One-shot (unary) | TLS / mTLS | @grpc/grpc-js + @grpc/proto-loader (+ grpc-js-reflection-client when ?proto= omitted) |
SSH streaming powers pluck tail (multiplexed fleet tail -F), pluck driftwatch, and pluck query. Fleet expansion (ssh://web-{01..40}) is handled at the URI level before the SSH connector opens any sockets.
MQTT has both modes: the default pluck("mqtt://broker/topic") returns a one-shot snapshot (drains up to limit messages or waitMs elapses); pass extract: { streaming: true } to switch to a long-running NDJSON stream. The streaming path drops + heartbeats with __pluckDropped when the consumer can't keep up.
gRPC server reflection: when the ?proto= query parameter is omitted, the connector queries the server's grpc.reflection.v1alpha.ServerReflection service to discover the schema at runtime. Servers that don't expose reflection surface a typed REFLECTION_UNAVAILABLE error with an actionable hint.
IMAP / MQTT / Kafka all have write companions. See the Actors reference for the publish / mark-read / move / etc surfaces.
Kafka stream() consumer-lag emit. During stream() the connector ticks the admin API every x-kafka-lag-interval-ms (default 10000, 0 disables) and emits a control chunk with meta.kind: "kafka:lag" carrying { partition, lag, high, consumed } per partition. Lag is the high-water-mark minus the last consumed offset. Admin failures emit a sibling kafka:lag-error chunk; the message stream itself keeps delivering.
Kafka:lag chunk schema (v0.35-R2+):
{ data: "", meta: {
kind: "kafka:lag",
topic: "<topic>",
partitions: [{ partition: 0, lag: "<bigint-string>",
high: "<bigint-string>", consumed: "<bigint-string>" | null }],
emittedAt: 1717172800000
} }
The lag/high/consumed fields are stringified BigInts – Kafka offsets routinely exceed 2^53, so casting via Number(x) loses precision. Use BigInt(meta.partitions[i].lag) downstream. Per-tick timeout (3× the interval, max 15s) ensures a wedged admin call doesn't stall the stream; on timeout the connector abandons the kafkajs admin instance and rebuilds on the next tick.
SSH pooled exec – runPooledSshCommand library API (v0.35-R2+).
import { runPooledSshCommand, setSshPoolMaxSize } from "@sizls/pluck";
setSshPoolMaxSize(256); // raise the default 32 cap for fan-out runs
const { stdout, stderr, exitCode, signal } = await runPooledSshCommand(
"ssh://user@host:22",
"uname -a",
{ timeout: 30_000, maxBodyBytes: 16 * 1024 * 1024, signal: ac.signal },
);
Routes a one-shot exec through the same pool the SFTP-read path uses. SSRF guard via validateHost; bastion via headers["x-ssh-bastion"]; signal aborts AND poisons the pool entry so a leaked channel doesn't get reused. Return shape: { stdout, stderr, exitCode: number | null, signal: string | null } – when the remote was killed by a signal, exitCode is null and signal carries the name (e.g. "KILL"). Callers checking exitCode === 0 ? success : failure MUST also check signal === null. maxBodyBytes (default 16 MB) caps stdout + stderr accumulation; overflow rejects with BODY_TOO_LARGE. Marked @experimental for v0.35; the surface may evolve before v1.0.
Feeds + syndication
| # | Connector | URI schemes matched | Auth | Notes |
|---|---|---|---|---|
| 23 | rss | rss://, atom://, feed://, *.rss, *.atom | None | Charset detection from BOM + XML decl + Content-Type. Podcast namespaces (itunes:*, content:encoded). Atom + RSS 2.0 dual path. |
Local files + media
Fixed-format consumers – the extract phase picks the right extractor by content type, and these connectors just read bytes.
| # | Connector | URI schemes matched | Auth | Notes |
|---|---|---|---|---|
| 24 | pdf | *.pdf paths / application/pdf URIs | None | Built-in PDF stream scanner (install unpdf for richer extraction). |
| 25 | audio | *.wav, *.mp3, *.m4a, *.ogg, *.flac | None | Routes to the transcribe + sense extractors. |
| 26 | image | *.png, *.jpg, *.jpeg, *.webp, *.gif | None | Routes to the OCR extractor. |
| 27 | file | file://absolute/path | None | Path-traversal guarded; symlinks resolved; max-size capped. |
Generic HTTP
| # | Connector | URI schemes matched | Auth | Notes |
|---|---|---|---|---|
| 28 | rest | rest:// | Authorization header | Typed REST helper – infers method from URI path + options. |
| 29 | graphql | graphql://, */graphql paths | Authorization header | Parses query / mutation → HTTP verb. |
| 30 | http | http://, https:// | Any header | Catch-all for everything the specific connectors don't claim. Last in priority order. |
The HTTP connector re-validates the resolved URL after redirect so a 302 to 169.254.169.254 cannot smuggle past the initial SSRF guard.
Introspection
Every live PluckInstance exposes the registry as a read/register surface:
import { createPluck } from "@sizls/pluck";
const pluck = createPluck();
pluck.connectors.list();
// ["twitter", "vimeo", "youtube", "s3", ..., "http"]
pluck.connectors.find("reddit://r/typescript"); // "reddit"
pluck.connectors.find("xyz://nowhere"); // undefined
Registering your own connector prepends it – see Concepts: Connect → Custom connectors for the full pattern plus the defineConnector() helper.
Streaming connectors – iteration
Three connectors today implement StreamingConnector. Iterate them with for await:
import { createWebSocketConnector } from "@sizls/pluck";
const conn = createWebSocketConnector();
const stream = await conn.stream(
"wss://stream.binance.com:9443/ws/btcusdt@trade",
{
subprotocols: ["binance"],
heartbeatIntervalMs: 30_000,
reconnect: { maxAttempts: 5, minDelayMs: 1000, maxDelayMs: 60_000 },
},
);
for await (const chunk of stream) {
if (chunk.data) process.stdout.write(chunk.data);
if (chunk.error) console.error(chunk.error.message); // non-fatal
if (chunk.end) break;
}
Every StreamChunk is either data, a non-fatal error, a terminal end: true, or carries meta (per-chunk metadata – line number, timestamp, source host for multiplexed fleet streams).
Public vs authenticated access
Not every connector reaches the full surface of the service it points at. Three tiers exist today:
Public-only today. Instagram, Twitter, TikTok, Spotify read public content through oEmbed and syndication endpoints. Private timelines, DMs, gated feeds, and liked-post archives all require authentication Pluck does not yet ship.
Env-var auth today. Google Drive (GOOGLE_DRIVE_TOKEN) and Dropbox (DROPBOX_TOKEN) accept a single environment variable holding an access token. Mint the token through the provider's existing UI, set the env var, and every matching URI uses it.
Coming in v0.5. A full OAuth2 flow via pluck oauth login <service> unlocks private Instagram, private Twitter/X, authenticated LinkedIn, authenticated Reddit, authenticated Slack, and the rest of the gated surfaces. Tokens land in an encrypted on-disk store; connectors pick them up transparently.
What's next
- Concepts: Connect – the phase + registry mental model.
- Reference: CLI – commands that drive these connectors from the shell.
- Reference: API – REST API wrapper exposing connect/extract over HTTP.