- Docs
- Bureau — Red Team (offensive)
- Fingerprint
Bureau — Red Team (offensive)
Fingerprint
A way to detect when the AI model behind an unchanging API URL has quietly been swapped for a different one.
Posture: 🔴 Red Team (offensive) · Status: alpha
What it does
When you call https://api.openai.com/v1/chat/completions with "model": "gpt-4o", you trust the vendor that the model on the other end is the one they advertised. Fingerprint checks that trust. It runs a small, fixed set of calibration questions against the model and hashes the answers. Two months later, it runs the same questions and checks if the hash still matches. If it doesn't, the model behind the API changed – silently, without a version bump.
Think of it as a periodic health check that asks "are you still the same model?" The output is a four-tier classification: stable (no change), minor (sampling noise), major (a checkpoint update), or swap (a wholly different model is now serving this endpoint). Every scan is signed. Every comparison cites two prior signed scans. No vendor can wave off the observations.
The same machinery also enumerates an MCP (Model Context Protocol) server's tool surface – what tools it exposes, what parameters they take – so MCP server drift gets the same treatment.
Who would use it
- A platform engineer at a regulated company pinning a known-good model fingerprint as the production baseline; CI fails any deploy where the live model has drifted.
- A journalist who suspects a vendor downgraded their flagship model to a cheaper variant overnight; Fingerprint lets them prove the swap with two signed Rekor entries.
- A research lab running longitudinal studies on a model's behavior – Fingerprint deltas turn "the model feels different lately" into hard data.
- An MCP server author wanting tamper-evident snapshots of their tool surface so downstream agents can detect when a tool's parameters change.
- A startup CTO whose customer is asking "how do you know the model you're paying for is the one that's actually serving us?" – Fingerprint is the receipt.
What you'll need
- Node.js 18+ and the Pluck CLI.
- An operator key:
pluck bureau keys generate --out ./keys --name "alice". - A responder module – a small JavaScript file that knows how to send a probe to the target model and return the response. Default-export a
(probe, signal) => Promise<{responseText, tokens?}>function. (Fingerprint stays transport-agnostic so it doesn't ship a vendor adapter zoo.) - Optional: internet access to publish scans to Sigstore Rekor.
Step-by-step
1. Scan the model and capture a baseline
pluck bureau fingerprint scan \
--vendor openai \
--model gpt-4o \
--keys ./keys \
--responder ./responders/openai.js \
--notarize --accept-public \
--out ./.fp
Output:
fingerprint/scan: openai/gpt-4o fingerprintHash=a1b2c3... envelopeHash=... rekorUuid=9f3a8b1c4d5e6f7a...
The fingerprintHash is the sha256 of the canonical-JSON probe-response set. The rekorUuid is the public-log entry anyone can re-fetch.
2. Pin the baseline
Save the Rekor uuid as the known-good reference for this target:
pluck bureau fingerprint baseline gpt-4o-2026-04 \
--rekor-uuid 9f3a8b1c4d5e6f7a... \
--out ./.fp
3. ... time passes ... rescan
A week or a month later, run the same scan command again. You'll get a new Rekor uuid and a new fingerprint hash.
4. Compare
Pass the two signed scan files to delta:
pluck bureau fingerprint delta \
./.fp/baseline.json \
./.fp/latest.json \
--keys ./keys --notarize --accept-public
Output:
fingerprint/delta: classification=swap envelopeHash=... rekorUuid=8c7b6a5d...
Exit code 1 on a swap classification – easy to wire into CI. The delta cassette anchors both source uuids inline, so a verifier can re-fetch and re-hash both sides without trusting the delta envelope alone.
5. (Bonus) Snapshot an MCP server
pluck bureau fingerprint mcp-enum http://localhost:8080
Output:
fingerprint/mcp-enum: http://localhost:8080 count=12 surfaceHash=4d5e6f7a...
Same shape, same drift detection.
Run it yourself
Drop this into a Node 18+ project (npm install @sizls/pluck-bureau-fingerprint @sizls/pluck-bureau-core tsx):
// index.ts
import {
CALIBRATION_PROBES,
createFingerprintSystem,
} from "@sizls/pluck-bureau-fingerprint";
import type { FingerprintResponder } from "@sizls/pluck-bureau-fingerprint";
import { generateOperatorKey } from "@sizls/pluck-bureau-core";
async function main() {
const operator = generateOperatorKey();
// Stub responders – in production, these dial the vendor API.
const baselineResponder: FingerprintResponder = async (probe) => ({
responseText: `[baseline] ${probe.id}`,
tokens: 8,
});
const driftedResponder: FingerprintResponder = async (probe) => ({
responseText: probe.id === CALIBRATION_PROBES[0]?.id ? "[swapped]" : `[baseline] ${probe.id}`,
tokens: 8,
});
const target = { vendor: "example", model: "gpt-baseline" };
const system = createFingerprintSystem({
signingKey: operator.privateKeyPem,
disablePausePoll: true,
disableLogging: true,
});
try {
const baseline = await system.baseline(target, baselineResponder, { notarize: false });
const latest = await system.scan(target, driftedResponder, { notarize: false });
console.log(`baseline hash: ${baseline.fingerprint.fingerprintHash.slice(0, 16)}...`);
console.log(`latest hash: ${latest.fingerprint.fingerprintHash.slice(0, 16)}...`);
const delta = system.facts.deltas()[0]?.delta;
if (delta) {
const drifted = delta.probeDeltas.filter((p) => p.status !== "same").length;
console.log(`classification: ${delta.classification} (${drifted} drifted)`);
}
} finally {
await system.shutdown();
}
}
main().catch((err) => { console.error(err); process.exit(1); });
Run with tsx index.ts. Expected output:
baseline hash: a1b2c3d4e5f6789a...
latest hash: 8c7b6a5d4e3f2109...
classification: minor (1 drifted)
▶ Open in StackBlitz – runs in your browser, no install required.
What you get
- A signed baseline you can pin and re-verify months later.
- Signed delta cassettes that anchor two prior Rekor uuids – the ultimate before-and-after.
- A four-tier drift classification (
stable/minor/major/swap) you can route into CI, alerts, or journalism. - A noise-suppression layer – by default, scans run probes under multiple sampling temperatures so a normal sampling wobble doesn't escalate to a false
swapclassification.
What it can't do
- Fingerprint can't tell you why a model changed. It just detects that it did. The vendor's release notes (or absence thereof) is the rest of the story.
- A vendor that ships different models to different identity tiers (free vs paid, US vs EU) can defeat a single-tier scan. Multi-tier coverage means running the scan from each tier.
- The drift classifier is deterministic but heuristic. A genuinely subtle prompt-formatting change that flips the same one or two probes every time will land as
minoreven if it's behavioral. Usemajorandswapas triggers, notminor. - Today's alpha takes responder modules from disk. Hosted vendor adapters (OpenAI, Anthropic, Bedrock, Ollama) are roadmap.
A real-world example
In June 2026, a fintech compliance lead pins their production model as a Fingerprint baseline. The signed Rekor uuid lives in their compliance vault. CI runs a fingerprint delta weekly against the live API.
In August, the weekly delta returns classification=major with 5 of the 32 probes drifted. Nothing dramatic, but enough to trigger their model-change runbook. They escalate to the vendor: "what changed?" The vendor's first response is "we did not update gpt-4o this month."
The compliance lead replies with two Rekor uuids: the baseline from June, the latest from this week. Both signed, both publicly verifiable. The vendor checks internally and, six hours later, acknowledges that an A/B routing change had quietly shifted ~5% of paid-tier traffic to a fine-tuned variant for an enterprise customer trial. The fix is rolled back. The compliance team has a Rekor-anchored audit trail proving exactly when the drift started and stopped.
For developers
Predicate URIs
https://pluck.run/Fingerprint.Model/v1
https://pluck.run/Fingerprint.Delta/v1
Two distinct URIs because the bodies are semantically different – a baseline is a pinned reference; a delta is a paired comparison anchoring two prior Rekor uuids. Verifiers MUST discriminate by predicate-type, not by inner-body fields.
Programs composed
attest, notarize, contradict, mirror, dsseSign, fetchRekorEntry. The mirror verb is what Fingerprint uses to re-run the same probe set across multiple sampling temperatures and squash the noise floor – see Concepts: Act for the underlying receipt pipeline.
Drift classification
| Class | Meaning |
|---|---|
stable | Fingerprint hash unchanged – no observable drift |
minor | 1–2 probes drifted – likely a sampling-temperature wobble |
major | 3+ probes drifted – likely a model checkpoint update |
swap | Fingerprint hash entirely different – silent vendor model swap |
Threat model and limits
- Ed25519 only, signed over RAW 32-byte digest – cosign and sigstore-go interop.
schemaVersion: 1literal.- 64-hex SPKI fingerprints, strict ISO 8601 UTC.
- Bounds. Probe-set size ≤ 32 probes; per-probe response excerpt ≤ 4 KiB; MCP tool list ≤ 1024 entries; cassette canonical-JSON ≤ 96 KiB.
- Subject digest cross-checked against canonical predicate digest on every verify path.
- Delta cassettes anchor both source uuids inline – a verifier can re-fetch and re-hash both sides without trusting the delta envelope alone.
fingerprintScanruns the same probe set under multiple sampling temperatures by default to suppress legitimate noise; classification only escalates tominor/major/swapwhen the drift survives the noise floor.
Studio routes
studio.pluck.run/bureau/fingerprint– global drift dashboard.studio.pluck.run/bureau/fingerprint/<vendor>/<model>– full timeline of baselines and deltas for one target.studio.pluck.run/bureau/fingerprint/mcp/<server>– MCP tool-surface history.
Library surface
import {
scanModelFingerprint,
computeFingerprintDelta,
} from "@sizls/pluck-bureau-fingerprint";
const baseline = await scanModelFingerprint({
vendor: "openai",
model: "gpt-4o",
responder,
signingKey,
notarize: true,
acceptPublic: true,
});
const delta = computeFingerprintDelta({ from, to, fromUuid, toUuid });
// delta.classification: "stable" | "minor" | "major" | "swap"
See also
- Oath – vendor commitments Fingerprint cross-checks against during a swap event.
- Dragnet – composes Fingerprint deltas into red dots for the dossier.
- Mole – sealed canaries are the canary-memorization variant; Fingerprint is the model-identity variant.
- Bureau Foundations → ProbePack – calibration packs follow the same shape.
- Concepts: Act –
attest/mirror/contradict/dsseSign.