Skip to content

Core Concepts

Sense

The sixth phase of the Pluck pipeline. Twenty-six zero-dependency sensors that read signals humans can't perceive. Ultrasonic beacons. Heart rate from a Zoom call. DTMF (Dual-Tone Multi-Frequency) tones in 30-year-old answering-machine tapes. Birdsong. Heartbeat from a stethoscope. MFCC (Mel-Frequency Cepstral Coefficients) vectors for speaker matching. FSK / PSK demodulation. Classical cipher classification + cracking. Invisible-character steganography detection (incl. the Unicode tag block used in ASCII-smuggling prompt-injection). Plus live streaming via createSensorStream for mic / SDR / SIP feeds.


The mental model

Reading is safe. Writing is audited. Sensing is perceiving what humans can't.

The other four phases – connect, extract, shape, act – work at the semantic layer: URIs, text, rows, mutations. The sense phase works a layer below. It takes a signal source (a WAV file, a video frame, an MP3), runs DSP on it, and returns typed findings: a spectrogram, the decoded DTMF digits, the rPPG (remote photoplethysmography) heart rate, the detected birdsong, the AM/FM/SSB-demodulated payload of a software-defined radio capture.

Sense is what makes Pluck not another scraper.

TypeScript
import { pluck } from "@sizls/pluck";

const result = await pluck.sense("./call-recording.wav", {
  detect: ["dtmf", "ultrasonic", "anomaly"],
});

console.log(result.sensed?.features);
// {
//   dtmf:       { digits: "212-555-0199", decodedAt: [0.1, 0.3, 0.5, ...] },
//   ultrasonic: { present: true, carrierHz: 19_200, decodedBytes: 42 },
//   anomaly:    { regions: [{ start: 42.1, end: 43.8, kind: "burst" }] },
// }

Sense is a phase, not a verb you call directly. The top-level sense() convenience wrapper and pluck.sense() instance method both run the pipeline – connect → navigate → sense – and return a PluckResult with kind: "sense" and a sensed: SenseResult payload.


The 37 sensors

Pluck ships with 37 sensors, each paired one-to-one with a SenseFeature string. Default operation: WAV audio in, typed findings out. The rPPG sensor is the odd one – it takes video or image input. The four text-domain sensors (cipher-classify, cipher-crack-caesar, cipher-crack-vigenere, steganography-text) accept text/* sources. The five image-domain sensors (ela, heatmap, moire, flicker, rolling-shutter) accept image/* sources via the optional sharp peer. The v0.10 CV sensors (faces, scene, ocr-text-regions, thermal, ground-anomaly) extend that surface with ML-backed analysis (face-api.js + @xenova/transformers optional peers). The animalsong sensor extends birdsong to mammals, amphibians, and insects via the shared audio DSP.

Audio – spectral & perceptual (7)

FeatureWhat it reads
fftFrequency spectrum via Cooley-Tukey FFT (Fast Fourier Transform). The workhorse primitive.
spectrogramTime/frequency map via STFT (Short-Time Fourier Transform). Feeds every visualisation downstream.
pitchFundamental frequency via autocorrelation + parabolic interp.
tempoBPM via onset detection + autocorrelation. Useful for music ETL.
chromagram12-band pitch-class energy – key / chord detection primitive. Floor 55 Hz.
mfccMFCC (Mel-Frequency Cepstral Coefficients) – 13-coefficient vector via 26-filter mel bank + DCT. Speaker / genre matching primitive.
anomalyStatistical outlier regions – "something weird starts at 0:42".

Audio – decoded payloads (4)

FeatureWhat it decodes
dtmfTouch-tone digits – "1 2 3 #". The classic party-trick sensor.
morseMorse code – dit-dah timing → text.
fskFSK (Frequency-Shift Keying) – Bell 103 by default (mark=1270 Hz, space=1070 Hz, 300 baud). Optional UART 8N1 → ASCII text.
pskPSK (Phase-Shift Keying) – BPSK demodulator. carrierHz required; default BPSK31 baud. Tolerates ±5 Hz carrier drift.

Audio – band scans + diagnostic (3)

FeatureWhat it scans
ultrasonic>18 kHz content – the cross-device tracking beacon detector.
infrasonic<20 Hz content – earthquake precursors, HVAC faults, seismic.
noise-floorRMS + spectral flatness – "is the mic live?" fail-fast check before expensive downstream work.

Radio demodulation (3)

FeatureWhat it demodulates
am-demodAM envelope demodulation for recorded SDR captures.
fm-demodFM frequency discrimination.
ssb-demodSingle-sideband (USB / LSB) demodulation – ham radio archives.

Identity (2)

FeatureWhat it reads
rppgHeart rate from face video. Pulse from skin-pixel colour shifts.
birdsongBioacoustic birdsong identification – bird species classification from spectrogram (birds only today).

Physiological + periodicity (3)

FeatureWhat it reads
heartbeatPCG (Phonocardiogram) / stethoscope audio – peak-picked BPM + rhythm classification (regular / irregular). Audio only; for heart-rate from video use rppg.
breathingRespiration rate – very-low-frequency envelope modulation (2–8 s per breath).
periodicityGeneral-purpose autocorrelation – fundamental period, harmonic ratio, and isPeriodic boolean.

Text domain – cipher + steganography (4)

All four accept text/* sources (string or UTF-8 Buffer). Enforce a 1 MB pre-DSP input cap (TEXT_TOO_LARGE). Honour AbortSignal inside hot loops.

FeatureWhat it reads
cipher-classifyFingerprint unknown ciphertext – Caesar / Vigenère / base64 / base32 / hex / URL-encoded / JWT. Returns family + confidence + metrics (IoC, bigram entropy) that drove the classification.
cipher-crack-caesarBrute-force all 26 shifts, chi-squared against English letter frequencies, return winning shift + decrypted plaintext.
cipher-crack-vigenereTwo-stage attack – Kasiski examination → candidate key lengths, then column-by-column Caesar crack. Recovers key + plaintext for ciphertexts ≥ 80 chars with key ≤ 24.
steganography-textInvisible-character detection: trailing whitespace, 14 zero-width code points + full Unicode tag block (U+E0000-U+E007F – the "ASCII smuggling" prompt-injection vector), and homoglyph substitution across Cyrillic / Greek / Armenian / Latin-extended families. Hit offsets index the original text so operators can surgically strip the payload.

Image domain – forensics + screen-capture detection (5)

All five accept image/* sources via the optional sharp peer (auto-installed lazily on first image call – a MISSING_PEER_DEP error points operators at pnpm add sharp). Every image decode goes through a two-phase cap: sharp.metadata() checks dimensions against MAX_IMAGE_PIXELS (8192²) BEFORE .raw().toBuffer(), defending against "billion-laughs" image bombs.

FeatureWhat it reads
elaError-Level Analysis – re-compress at JPEG q=90, diff per-pixel luminance, report tamperingScore, meanError, p99Error, maxError. Classical "does this image look edited?" forensic primitive.
heatmap8×8 Sobel-magnitude energy grid + maxCell coordinates. Generic "where is the signal concentrated" chainable primitive for downstream reports.
moire2D-FFT high-pass peak-to-mean ratio on an aspect-preserving 256-max resize. Flags screen-recorded / CRT-photographed / cloth-re-shoot content. Reports periodPixels + dominant frequency.
flicker1D FFT over row-mean luminance – catches horizontal banding from AC-lit scenes captured by rolling-shutter cameras (100 Hz EU, 120 Hz US).
rolling-shutterSobel-based slant consistency on "vertical-enough" edges – the CMOS slant signature useful for deepfake / composite detection. Reports meanSlantDeg + slantStddevDeg.

CV domain – ML-backed vision (5) + broader bioacoustics (1)

The v0.10 CV sensors wrap optional face-api.js / @xenova/transformers peers – lazy-loaded so consumers who don't touch images don't pay for the 40 MB install. Every CV sensor that runs Sobel / CCL pre-downscales the input to a 1024-longest-edge analysis cap; detection coordinates + reported dimensions are scaled back to the ORIGINAL image space so consumer overlays always align with the input bytes.

FeatureWhat it reads
facesFace bounding boxes + 68-point landmarks via face-api.js. Single-frame liveness heuristic cross-references Sobel edge density against face-api's confidence – a lightweight "maybe look twice" gate before a full multi-frame liveness pipeline.
sceneImage classification via @xenova/transformers (default Xenova/vit-base-patch16-224, overridable). Returns top-5 predictions – chains naturally with the image-forensic sensors for "indoor screenshot + moiré detected + flicker detected" stacked signal.
ocr-text-regionsPre-OCR text-region detection. Sobel horizontal-gradient + morphological closing + connected-components labelling → ranked bounding boxes. No peer dep. Output chains with Pluck's existing OCR extractor for "OCR only the regions that matter."
animalsongBioacoustic ID beyond birds – frogs, crickets, cicadas, bats (ultrasonic – skipped below Nyquist), generic mammal calls. Goertzel carrier detection + envelope-autocorrelation pulse-rate estimation over 8 starter signatures; community species via defineSensor.
thermalIR / FLIR hotspot detection via p95-threshold segmentation + stddev-based dynamic-range gate (flat images return zero hotspots cleanly). No peer dep. Pluck doesn't parse FLIR radiometric metadata – feed decoded thermal imagery through.
ground-anomalySatellite / aerial two-mode sensor. Single-image: visible-only NDVI surrogate flags bare-earth regions. Change-detection (via SenseOptions.reference): luminance-diffs two frames and reports top N changed regions. No peer dep.

All 37 sensors share a consistent surface: accepts(source) + sense(source, options)SenseResult { features, signal, confidence, method }. The DSP primitives live in packages/core/src/sense/dsp/ (audio), packages/core/src/sense/text-dsp.ts (text: charFrequency, bigramEntropy, trigramEntropy, indexOfCoincidence, kasiski, chi-squared, Caesar shift), and packages/core/src/sense/image/ (image: sourceToImage, resizeImage, resizeToAnalysisMax, recompressJpeg, toLuminance, applyKernel, convolve, sobelMagnitude, fft2d, fft1d, connectedComponents, MAX_IMAGE_PIXELS, MAX_FFT_PADDED_PIXELS, CV_ANALYSIS_MAX_DIM). Everything composes from those primitives. That makes Pluck the only signal-analysis library in the JS ecosystem whose audio + text + video + NDVI + thermal sensors run on Cloudflare Workers, Vercel edge, or any other runtime without native bindings (image decode + CV model inference aside – they need sharp / face-api.js / @xenova/transformers).


Live streaming – createSensorStream

File-based sensing is pluck.sense(url, { detect }). Live sensing is createSensorStream(source, options) – takes a ReadableStream<Float32Array> (mic, SDR capture, SIP tap, live file tail) and emits a ReadableStream<SensorStreamEvent> with one event per window per sensor.

TypeScript
import { createSensorStream } from "@sizls/pluck";

const stream = createSensorStream(micAudioStream, {
  sampleRate: 44_100,
  detect: ["fft", "heartbeat", "breathing"],
  windowSize: 4096,  // default
  hop: 2048,         // default = 50% overlap
  signal: abortController.signal,
});

for await (const event of stream) {
  if (event.feature === "heartbeat" && event.result.heartbeat) {
    console.log(`${event.time.toFixed(1)}s → ${event.result.heartbeat.bpm} bpm`);
  }
}

The primitive is memory-bounded (rolling buffer + windowSize + pending chunk – ~40 KB peak regardless of stream length), backpressured (highWaterMark: 1 on the event stream so a slow consumer pauses the sensor loop), and abort-aware (cancelling the signal closes both directions). Non-audio sensors like rppg are silently skipped at resolve time – you can pass detect: ["fft", "rppg"] on an audio stream and only fft emits.



Reading signals

The happy path is one line. Specify what you want to detect; get back typed findings:

TypeScript
const result = await pluck.sense("./call-recording.wav", {
  detect: ["dtmf", "pitch", "tempo"],
});

// Every decoded signal comes through result.sensed.decoded
for (const signal of result.sensed?.decoded ?? []) {
  console.log(`${signal.kind} @ ${signal.startTime}s: ${signal.data}`);
}
// dtmf @ 0.10s: 1
// dtmf @ 0.30s: 2
// dtmf @ 0.50s: 3

SenseOptions gives you time windows, frequency windows, and a resolution knob:

TypeScript
interface SenseOptions {
  start?: number;           // Start time in seconds
  end?: number;             // End time in seconds
  minHz?: number;           // Lower frequency bound
  maxHz?: number;           // Upper frequency bound
  resolution?: "fast" | "standard" | "deep"; // Tradeoff knob
  detect?: SenseFeature[];  // Which sensors to run
  signal?: AbortSignal;     // Abort mid-analysis
}

The resolution knob is Pluck's recognition that "one sensor, many quality levels" is the right ergonomic shape. fast favours throughput for CI / real-time use. deep favours confidence for forensic work.


pluck.dowse() – zero-config reconnaissance

When you don't yet know what is in a signal – or don't care to enumerate features – call dowse(). It runs every sensor at resolution: "fast", ranks the findings by confidence, and hands back both the ranked list and the single most confident finding so "what is this?" is a one-liner.

TypeScript
import { pluck } from "@sizls/pluck";

const scan = await pluck.dowse("./mystery.wav");

console.log(scan.topFinding?.summary);
// → 'Decoded dtmf: "123"'

for (const f of scan.findings) {
  console.log(`${f.sensor}  ${f.confidence.toFixed(2)}  ${f.summary}`);
}
// dtmf       0.95  Decoded dtmf: "123"
// pitch      0.71  Feature: pitch
// fft        0.50  Spectrum analysis: 512 bins

The DowseResult is deliberately thin:

TypeScript
interface DowseResult {
  uri: string;
  findings: DowseFinding[]; // sorted by confidence desc
  topFinding?: DowseFinding; // findings[0], or undefined on empty
  duration: number;
}

interface DowseFinding {
  sensor: string;
  confidence: number;
  summary: string;
  details: unknown; // sensor-specific payload
}

Think of it as file(1) for signals – a magic-wand entry point for users who haven't internalised the 37-sensor menu. Once topFinding.sensor tells you what's in the file, pluck.sense() with detect: [...] gets you the precision result.

dowse() is also exposed as a top-level export (import { dowse } from "@sizls/pluck"), as an MCP tool (pluck_dowse), and as a CLI subcommand (pluck dowse ./mystery.wav).


What you get back

Every sensor produces the same SenseResult shape:

TypeScript
interface SenseResult {
  features: Record<string, unknown>; // Per-sensor typed findings
  spectra?: SpectraData;             // FFT/spectrogram/chromagram arrays
  decoded?: DecodedSignal[];         // DTMF / Morse / etc. with timestamps
  anomalies?: Array<{                // Outlier regions
    start: number;
    end: number;
    kind: string;
    confidence: number;
  }>;
  signal: {                          // Source metadata
    sampleRate?: number;
    duration?: number;
    channels?: number;
    bitDepth?: number;
    format?: string;
  };
  confidence: number;                // 0..1 overall
  method: string;                    // "dtmf" / "rppg" / "birdsong-signature" / ...
}

features[sensorName] holds the sensor-specific payload. For ultrasonic it's { present, carrierHz, decodedBytes }. For rppg it's { present, bpm, confidence }. Each sensor documents its own return shape inline – see packages/core/src/sense/sensors/<name>.ts in the source.

decoded is the timeline view: every sensor that produces timed events (DTMF, Morse, birdsong calls, demodulator output) contributes here, sorted by startTime. One loop, many decoders.


Custom sensors

Same pattern as every other phase – defineSensor is a typed pass-through helper:

TypeScript
import { createPluck, defineSensor } from "@sizls/pluck";

const birdsong = defineSensor({
  name: "my-bird",
  accepts: (source) => source.metadata.sourceType === "audio",
  async sense(source, options) {
    const content = source.content as Buffer;
    // Run your own DSP pipeline on content
    const matches = detectPigeons(content, options.start, options.end);
    return {
      features: { "my-bird": { pigeonCount: matches.length } },
      decoded: matches.map((m) => ({
        kind: "pigeon-coo",
        startTime: m.t,
        endTime: m.t + 0.4,
        data: "rucoocoo",
        confidence: m.confidence,
      })),
      signal: { sampleRate: 44100 },
      confidence: 0.8,
      method: "my-bird:pattern-match",
    };
  },
});

const pluck = createPluck({ sensors: [birdsong] });

Custom sensors are prepended, so they win over built-ins that would accept the same source. Feature dispatch is by name – if your sensor's name matches a built-in, it replaces the built-in for that feature.

Introspection

Every live instance exposes the sensor registry via pluck.sensors:

TypeScript
pluck.sensors.list();                      // all registered sensor names
pluck.sensors.whichHandles("dtmf");        // "dtmf"
pluck.sensors.whichHandles("flicker");     // undefined – planned, not shipped
pluck.sensors.find("dtmf", navigateResult); // "dtmf" (or undefined if not accepted)
pluck.sensors.findAll(navigateResult);      // every sensor that accepts this source

whichHandles is the debug primitive the error path needs – call it before pluck.sense() to know whether a feature is actually shipping. Unshipped features throw NO_SENSOR.


Killer demos

The sense phase is where the "not another scraper" narrative earns its keep. Each of these is a one-line command today – using the --sense <feature> flag on pluck run, which can be repeated to request multiple sensors:

Shell
# "DTMF party trick – decode a 30-year-old tape"
pluck run ./answering-machine.wav --sense dtmf

# "Heart rate from a face video"
pluck run ./interview.mp4 --sense rppg

# "What bird was that?"
pluck run ./backyard.wav --sense birdsong

# "Decode the Morse code in this QRP capture"
pluck run ./cw.wav --sense morse

# "AM demodulate this SDR capture"
pluck run ./sdr-630m.wav --sense am-demod

# Multiple sensors in one run
pluck run ./suspicious.wav --sense ultrasonic --sense anomaly

These compose with the other phases – pluck snitch <url> runs connect + navigate + extract + sense in one command and produces an Ed25519-signed forensic report. Shape + Sense together give you typed-then-analysed pipelines. Act + Sense (future: pluck.listen(uri, { on: ultrasonic, then: act })) closes the loop into reactive perception.


Planned sensors

Eighteen more sensor types are named in the type system as PlannedSenseFeature – chromagram, MFCC, watermark detection, voiceprints, FSK, PSK, heartbeat/breathing from audio, engine signatures, seismic P/S waves, moire patterns, flicker, rolling-shutter, steganography, error-level analysis, thermal heatmaps, periodicity, noise-floor. These are marked NO_SENSOR if you try to use them today – kept on the type to document the roadmap without polluting autocomplete for working features.


Full runnable example

The magic-wand entry point – pluck.dowse() against a mystery audio URL, every sensor fired in fast mode, ranked findings returned. Opens in a fresh StackBlitz sandbox.


What's next

Edit this page on GitHub
Previous
Act
Next
Output

Ready to build?

Install Pluck and follow the Quick Start guide to wire MCP-first data pipelines into your agents and fleets in minutes.

Get started →