- Docs
- Core Concepts
- Sense
Core Concepts
Sense
The sixth phase of the Pluck pipeline. Twenty-six zero-dependency sensors that read signals humans can't perceive. Ultrasonic beacons. Heart rate from a Zoom call. DTMF (Dual-Tone Multi-Frequency) tones in 30-year-old answering-machine tapes. Birdsong. Heartbeat from a stethoscope. MFCC (Mel-Frequency Cepstral Coefficients) vectors for speaker matching. FSK / PSK demodulation. Classical cipher classification + cracking. Invisible-character steganography detection (incl. the Unicode tag block used in ASCII-smuggling prompt-injection). Plus live streaming via createSensorStream for mic / SDR / SIP feeds.
The mental model
Reading is safe. Writing is audited. Sensing is perceiving what humans can't.
The other four phases – connect, extract, shape, act – work at the semantic layer: URIs, text, rows, mutations. The sense phase works a layer below. It takes a signal source (a WAV file, a video frame, an MP3), runs DSP on it, and returns typed findings: a spectrogram, the decoded DTMF digits, the rPPG (remote photoplethysmography) heart rate, the detected birdsong, the AM/FM/SSB-demodulated payload of a software-defined radio capture.
Sense is what makes Pluck not another scraper.
import { pluck } from "@sizls/pluck";
const result = await pluck.sense("./call-recording.wav", {
detect: ["dtmf", "ultrasonic", "anomaly"],
});
console.log(result.sensed?.features);
// {
// dtmf: { digits: "212-555-0199", decodedAt: [0.1, 0.3, 0.5, ...] },
// ultrasonic: { present: true, carrierHz: 19_200, decodedBytes: 42 },
// anomaly: { regions: [{ start: 42.1, end: 43.8, kind: "burst" }] },
// }
Sense is a phase, not a verb you call directly. The top-level sense() convenience wrapper and pluck.sense() instance method both run the pipeline – connect → navigate → sense – and return a PluckResult with kind: "sense" and a sensed: SenseResult payload.
The 37 sensors
Pluck ships with 37 sensors, each paired one-to-one with a SenseFeature string. Default operation: WAV audio in, typed findings out. The rPPG sensor is the odd one – it takes video or image input. The four text-domain sensors (cipher-classify, cipher-crack-caesar, cipher-crack-vigenere, steganography-text) accept text/* sources. The five image-domain sensors (ela, heatmap, moire, flicker, rolling-shutter) accept image/* sources via the optional sharp peer. The v0.10 CV sensors (faces, scene, ocr-text-regions, thermal, ground-anomaly) extend that surface with ML-backed analysis (face-api.js + @xenova/transformers optional peers). The animalsong sensor extends birdsong to mammals, amphibians, and insects via the shared audio DSP.
Audio – spectral & perceptual (7)
| Feature | What it reads |
|---|---|
fft | Frequency spectrum via Cooley-Tukey FFT (Fast Fourier Transform). The workhorse primitive. |
spectrogram | Time/frequency map via STFT (Short-Time Fourier Transform). Feeds every visualisation downstream. |
pitch | Fundamental frequency via autocorrelation + parabolic interp. |
tempo | BPM via onset detection + autocorrelation. Useful for music ETL. |
chromagram | 12-band pitch-class energy – key / chord detection primitive. Floor 55 Hz. |
mfcc | MFCC (Mel-Frequency Cepstral Coefficients) – 13-coefficient vector via 26-filter mel bank + DCT. Speaker / genre matching primitive. |
anomaly | Statistical outlier regions – "something weird starts at 0:42". |
Audio – decoded payloads (4)
| Feature | What it decodes |
|---|---|
dtmf | Touch-tone digits – "1 2 3 #". The classic party-trick sensor. |
morse | Morse code – dit-dah timing → text. |
fsk | FSK (Frequency-Shift Keying) – Bell 103 by default (mark=1270 Hz, space=1070 Hz, 300 baud). Optional UART 8N1 → ASCII text. |
psk | PSK (Phase-Shift Keying) – BPSK demodulator. carrierHz required; default BPSK31 baud. Tolerates ±5 Hz carrier drift. |
Audio – band scans + diagnostic (3)
| Feature | What it scans |
|---|---|
ultrasonic | >18 kHz content – the cross-device tracking beacon detector. |
infrasonic | <20 Hz content – earthquake precursors, HVAC faults, seismic. |
noise-floor | RMS + spectral flatness – "is the mic live?" fail-fast check before expensive downstream work. |
Radio demodulation (3)
| Feature | What it demodulates |
|---|---|
am-demod | AM envelope demodulation for recorded SDR captures. |
fm-demod | FM frequency discrimination. |
ssb-demod | Single-sideband (USB / LSB) demodulation – ham radio archives. |
Identity (2)
| Feature | What it reads |
|---|---|
rppg | Heart rate from face video. Pulse from skin-pixel colour shifts. |
birdsong | Bioacoustic birdsong identification – bird species classification from spectrogram (birds only today). |
Physiological + periodicity (3)
| Feature | What it reads |
|---|---|
heartbeat | PCG (Phonocardiogram) / stethoscope audio – peak-picked BPM + rhythm classification (regular / irregular). Audio only; for heart-rate from video use rppg. |
breathing | Respiration rate – very-low-frequency envelope modulation (2–8 s per breath). |
periodicity | General-purpose autocorrelation – fundamental period, harmonic ratio, and isPeriodic boolean. |
Text domain – cipher + steganography (4)
All four accept text/* sources (string or UTF-8 Buffer). Enforce a 1 MB pre-DSP input cap (TEXT_TOO_LARGE). Honour AbortSignal inside hot loops.
| Feature | What it reads |
|---|---|
cipher-classify | Fingerprint unknown ciphertext – Caesar / Vigenère / base64 / base32 / hex / URL-encoded / JWT. Returns family + confidence + metrics (IoC, bigram entropy) that drove the classification. |
cipher-crack-caesar | Brute-force all 26 shifts, chi-squared against English letter frequencies, return winning shift + decrypted plaintext. |
cipher-crack-vigenere | Two-stage attack – Kasiski examination → candidate key lengths, then column-by-column Caesar crack. Recovers key + plaintext for ciphertexts ≥ 80 chars with key ≤ 24. |
steganography-text | Invisible-character detection: trailing whitespace, 14 zero-width code points + full Unicode tag block (U+E0000-U+E007F – the "ASCII smuggling" prompt-injection vector), and homoglyph substitution across Cyrillic / Greek / Armenian / Latin-extended families. Hit offsets index the original text so operators can surgically strip the payload. |
Image domain – forensics + screen-capture detection (5)
All five accept image/* sources via the optional sharp peer (auto-installed lazily on first image call – a MISSING_PEER_DEP error points operators at pnpm add sharp). Every image decode goes through a two-phase cap: sharp.metadata() checks dimensions against MAX_IMAGE_PIXELS (8192²) BEFORE .raw().toBuffer(), defending against "billion-laughs" image bombs.
| Feature | What it reads |
|---|---|
ela | Error-Level Analysis – re-compress at JPEG q=90, diff per-pixel luminance, report tamperingScore, meanError, p99Error, maxError. Classical "does this image look edited?" forensic primitive. |
heatmap | 8×8 Sobel-magnitude energy grid + maxCell coordinates. Generic "where is the signal concentrated" chainable primitive for downstream reports. |
moire | 2D-FFT high-pass peak-to-mean ratio on an aspect-preserving 256-max resize. Flags screen-recorded / CRT-photographed / cloth-re-shoot content. Reports periodPixels + dominant frequency. |
flicker | 1D FFT over row-mean luminance – catches horizontal banding from AC-lit scenes captured by rolling-shutter cameras (100 Hz EU, 120 Hz US). |
rolling-shutter | Sobel-based slant consistency on "vertical-enough" edges – the CMOS slant signature useful for deepfake / composite detection. Reports meanSlantDeg + slantStddevDeg. |
CV domain – ML-backed vision (5) + broader bioacoustics (1)
The v0.10 CV sensors wrap optional face-api.js / @xenova/transformers peers – lazy-loaded so consumers who don't touch images don't pay for the 40 MB install. Every CV sensor that runs Sobel / CCL pre-downscales the input to a 1024-longest-edge analysis cap; detection coordinates + reported dimensions are scaled back to the ORIGINAL image space so consumer overlays always align with the input bytes.
| Feature | What it reads |
|---|---|
faces | Face bounding boxes + 68-point landmarks via face-api.js. Single-frame liveness heuristic cross-references Sobel edge density against face-api's confidence – a lightweight "maybe look twice" gate before a full multi-frame liveness pipeline. |
scene | Image classification via @xenova/transformers (default Xenova/vit-base-patch16-224, overridable). Returns top-5 predictions – chains naturally with the image-forensic sensors for "indoor screenshot + moiré detected + flicker detected" stacked signal. |
ocr-text-regions | Pre-OCR text-region detection. Sobel horizontal-gradient + morphological closing + connected-components labelling → ranked bounding boxes. No peer dep. Output chains with Pluck's existing OCR extractor for "OCR only the regions that matter." |
animalsong | Bioacoustic ID beyond birds – frogs, crickets, cicadas, bats (ultrasonic – skipped below Nyquist), generic mammal calls. Goertzel carrier detection + envelope-autocorrelation pulse-rate estimation over 8 starter signatures; community species via defineSensor. |
thermal | IR / FLIR hotspot detection via p95-threshold segmentation + stddev-based dynamic-range gate (flat images return zero hotspots cleanly). No peer dep. Pluck doesn't parse FLIR radiometric metadata – feed decoded thermal imagery through. |
ground-anomaly | Satellite / aerial two-mode sensor. Single-image: visible-only NDVI surrogate flags bare-earth regions. Change-detection (via SenseOptions.reference): luminance-diffs two frames and reports top N changed regions. No peer dep. |
All 37 sensors share a consistent surface: accepts(source) + sense(source, options) → SenseResult { features, signal, confidence, method }. The DSP primitives live in packages/core/src/sense/dsp/ (audio), packages/core/src/sense/text-dsp.ts (text: charFrequency, bigramEntropy, trigramEntropy, indexOfCoincidence, kasiski, chi-squared, Caesar shift), and packages/core/src/sense/image/ (image: sourceToImage, resizeImage, resizeToAnalysisMax, recompressJpeg, toLuminance, applyKernel, convolve, sobelMagnitude, fft2d, fft1d, connectedComponents, MAX_IMAGE_PIXELS, MAX_FFT_PADDED_PIXELS, CV_ANALYSIS_MAX_DIM). Everything composes from those primitives. That makes Pluck the only signal-analysis library in the JS ecosystem whose audio + text + video + NDVI + thermal sensors run on Cloudflare Workers, Vercel edge, or any other runtime without native bindings (image decode + CV model inference aside – they need sharp / face-api.js / @xenova/transformers).
Live streaming – createSensorStream
File-based sensing is pluck.sense(url, { detect }). Live sensing is createSensorStream(source, options) – takes a ReadableStream<Float32Array> (mic, SDR capture, SIP tap, live file tail) and emits a ReadableStream<SensorStreamEvent> with one event per window per sensor.
import { createSensorStream } from "@sizls/pluck";
const stream = createSensorStream(micAudioStream, {
sampleRate: 44_100,
detect: ["fft", "heartbeat", "breathing"],
windowSize: 4096, // default
hop: 2048, // default = 50% overlap
signal: abortController.signal,
});
for await (const event of stream) {
if (event.feature === "heartbeat" && event.result.heartbeat) {
console.log(`${event.time.toFixed(1)}s → ${event.result.heartbeat.bpm} bpm`);
}
}
The primitive is memory-bounded (rolling buffer + windowSize + pending chunk – ~40 KB peak regardless of stream length), backpressured (highWaterMark: 1 on the event stream so a slow consumer pauses the sensor loop), and abort-aware (cancelling the signal closes both directions). Non-audio sensors like rppg are silently skipped at resolve time – you can pass detect: ["fft", "rppg"] on an audio stream and only fft emits.
Reading signals
The happy path is one line. Specify what you want to detect; get back typed findings:
const result = await pluck.sense("./call-recording.wav", {
detect: ["dtmf", "pitch", "tempo"],
});
// Every decoded signal comes through result.sensed.decoded
for (const signal of result.sensed?.decoded ?? []) {
console.log(`${signal.kind} @ ${signal.startTime}s: ${signal.data}`);
}
// dtmf @ 0.10s: 1
// dtmf @ 0.30s: 2
// dtmf @ 0.50s: 3
SenseOptions gives you time windows, frequency windows, and a resolution knob:
interface SenseOptions {
start?: number; // Start time in seconds
end?: number; // End time in seconds
minHz?: number; // Lower frequency bound
maxHz?: number; // Upper frequency bound
resolution?: "fast" | "standard" | "deep"; // Tradeoff knob
detect?: SenseFeature[]; // Which sensors to run
signal?: AbortSignal; // Abort mid-analysis
}
The resolution knob is Pluck's recognition that "one sensor, many quality levels" is the right ergonomic shape. fast favours throughput for CI / real-time use. deep favours confidence for forensic work.
pluck.dowse() – zero-config reconnaissance
When you don't yet know what is in a signal – or don't care to enumerate features – call dowse(). It runs every sensor at resolution: "fast", ranks the findings by confidence, and hands back both the ranked list and the single most confident finding so "what is this?" is a one-liner.
import { pluck } from "@sizls/pluck";
const scan = await pluck.dowse("./mystery.wav");
console.log(scan.topFinding?.summary);
// → 'Decoded dtmf: "123"'
for (const f of scan.findings) {
console.log(`${f.sensor} ${f.confidence.toFixed(2)} ${f.summary}`);
}
// dtmf 0.95 Decoded dtmf: "123"
// pitch 0.71 Feature: pitch
// fft 0.50 Spectrum analysis: 512 bins
The DowseResult is deliberately thin:
interface DowseResult {
uri: string;
findings: DowseFinding[]; // sorted by confidence desc
topFinding?: DowseFinding; // findings[0], or undefined on empty
duration: number;
}
interface DowseFinding {
sensor: string;
confidence: number;
summary: string;
details: unknown; // sensor-specific payload
}
Think of it as file(1) for signals – a magic-wand entry point for users who haven't internalised the 37-sensor menu. Once topFinding.sensor tells you what's in the file, pluck.sense() with detect: [...] gets you the precision result.
dowse() is also exposed as a top-level export (import { dowse } from "@sizls/pluck"), as an MCP tool (pluck_dowse), and as a CLI subcommand (pluck dowse ./mystery.wav).
What you get back
Every sensor produces the same SenseResult shape:
interface SenseResult {
features: Record<string, unknown>; // Per-sensor typed findings
spectra?: SpectraData; // FFT/spectrogram/chromagram arrays
decoded?: DecodedSignal[]; // DTMF / Morse / etc. with timestamps
anomalies?: Array<{ // Outlier regions
start: number;
end: number;
kind: string;
confidence: number;
}>;
signal: { // Source metadata
sampleRate?: number;
duration?: number;
channels?: number;
bitDepth?: number;
format?: string;
};
confidence: number; // 0..1 overall
method: string; // "dtmf" / "rppg" / "birdsong-signature" / ...
}
features[sensorName] holds the sensor-specific payload. For ultrasonic it's { present, carrierHz, decodedBytes }. For rppg it's { present, bpm, confidence }. Each sensor documents its own return shape inline – see packages/core/src/sense/sensors/<name>.ts in the source.
decoded is the timeline view: every sensor that produces timed events (DTMF, Morse, birdsong calls, demodulator output) contributes here, sorted by startTime. One loop, many decoders.
Custom sensors
Same pattern as every other phase – defineSensor is a typed pass-through helper:
import { createPluck, defineSensor } from "@sizls/pluck";
const birdsong = defineSensor({
name: "my-bird",
accepts: (source) => source.metadata.sourceType === "audio",
async sense(source, options) {
const content = source.content as Buffer;
// Run your own DSP pipeline on content
const matches = detectPigeons(content, options.start, options.end);
return {
features: { "my-bird": { pigeonCount: matches.length } },
decoded: matches.map((m) => ({
kind: "pigeon-coo",
startTime: m.t,
endTime: m.t + 0.4,
data: "rucoocoo",
confidence: m.confidence,
})),
signal: { sampleRate: 44100 },
confidence: 0.8,
method: "my-bird:pattern-match",
};
},
});
const pluck = createPluck({ sensors: [birdsong] });
Custom sensors are prepended, so they win over built-ins that would accept the same source. Feature dispatch is by name – if your sensor's name matches a built-in, it replaces the built-in for that feature.
Introspection
Every live instance exposes the sensor registry via pluck.sensors:
pluck.sensors.list(); // all registered sensor names
pluck.sensors.whichHandles("dtmf"); // "dtmf"
pluck.sensors.whichHandles("flicker"); // undefined – planned, not shipped
pluck.sensors.find("dtmf", navigateResult); // "dtmf" (or undefined if not accepted)
pluck.sensors.findAll(navigateResult); // every sensor that accepts this source
whichHandles is the debug primitive the error path needs – call it before pluck.sense() to know whether a feature is actually shipping. Unshipped features throw NO_SENSOR.
Killer demos
The sense phase is where the "not another scraper" narrative earns its keep. Each of these is a one-line command today – using the --sense <feature> flag on pluck run, which can be repeated to request multiple sensors:
# "DTMF party trick – decode a 30-year-old tape"
pluck run ./answering-machine.wav --sense dtmf
# "Heart rate from a face video"
pluck run ./interview.mp4 --sense rppg
# "What bird was that?"
pluck run ./backyard.wav --sense birdsong
# "Decode the Morse code in this QRP capture"
pluck run ./cw.wav --sense morse
# "AM demodulate this SDR capture"
pluck run ./sdr-630m.wav --sense am-demod
# Multiple sensors in one run
pluck run ./suspicious.wav --sense ultrasonic --sense anomaly
These compose with the other phases – pluck snitch <url> runs connect + navigate + extract + sense in one command and produces an Ed25519-signed forensic report. Shape + Sense together give you typed-then-analysed pipelines. Act + Sense (future: pluck.listen(uri, { on: ultrasonic, then: act })) closes the loop into reactive perception.
Planned sensors
Eighteen more sensor types are named in the type system as PlannedSenseFeature – chromagram, MFCC, watermark detection, voiceprints, FSK, PSK, heartbeat/breathing from audio, engine signatures, seismic P/S waves, moire patterns, flicker, rolling-shutter, steganography, error-level analysis, thermal heatmaps, periodicity, noise-floor. These are marked NO_SENSOR if you try to use them today – kept on the type to document the roadmap without polluting autocomplete for working features.
Full runnable example
The magic-wand entry point – pluck.dowse() against a mystery audio URL, every sensor fired in fast mode, ranked findings returned. Opens in a fresh StackBlitz sandbox.
What's next
- MCP-First Pipeline – every sensor is exposed as an MCP tool, giving agents perception humans can't match.
- Recipe: Snitch Privacy – signed forensic report composing connect + extract + sense.
- Recipe: DriftWatch Fleet – fleet-wide drift detection with signed receipts.