- Docs
- Bureau — Red Team (offensive)
- Mole
Bureau — Red Team (offensive)
Mole
A way to prove an AI model was trained on a specific document by sealing a unique passage in advance, then catching the model reproducing it later.
Posture: 🔴 Red Team (offensive) · Status: alpha
What it does
If you suspect an AI vendor trained their model on copyrighted material, the difficult part is not catching the model reproducing the text – it is proving you did not author the text after the model was trained. Mole addresses this by requiring the operator to commit to a passage before any probe runs. The operator authors a distinctive document (a "canary"), hashes it, signs the hash, and notarizes the seal to Sigstore Rekor. The result is a public, timestamped record that the text existed at a specific time under a specific signing key.
Later, the operator queries the target model with prompts and checks whether the response reproduces the canary. Mole scores the reproduction deterministically using n-gram overlap, edit distance, and verbatim phrase count. If the score crosses the threshold, the verdict is signed and notarized. The output is a citation bundle suitable for journalism or legal review: canary hash, sealed-at timestamp, Rekor uuid, verdict score, and a reproducible prompt.
The composition produces a Rekor-anchored chain of observations linking the canary's sealed-at time to the model's later reproduction.
Who would use it
- A novelist whose unpublished manuscript leaked to the internet, suspecting a vendor scraped it before the rights deal – Mole produces the receipt.
- A research nonprofit running a controlled study of memorization rates across foundation models for an academic paper.
- A journalist at The New York Times investigating whether a specific corpus made it into a training set without licensing – Mole produces the citation bundle.
- A copyright lawyer assembling observations for an early-stage motion in a training-data lawsuit (with the caveat that real evidentiary use needs Bureau operator review – see below).
- A startup whose internal API documentation appears verbatim in a vendor's chatbot answers, looking for tamper-evident proof.
What you'll need
- Node.js 18+ and the Pluck CLI.
- An operator key:
pluck bureau keys generate --out ./keys --name "alice". - A canary file – a plaintext document containing your unique passage. The more distinctive (specific names, numbers, idioms), the better the memorization scoring works.
- Internet access to publish to Sigstore Rekor.
- Patience. The seal must predate the probe by enough time that it's plausible the vendor's training pipeline could have ingested the text. Publish the canary publicly somewhere first, wait, then probe.
Step-by-step
1. Author and seal the canary
Write your distinctive document, then seal:
pluck bureau mole init ./mole-run-1 \
--canary ./article.txt \
--canary-id nyt-2024-01-15 \
--keys ./keys \
--copyright-holder "Alice Researcher"
Output:
mole/init: sealed canary "nyt-2024-01-15" → ./mole-run-1/canary.json
hash: a1b2c3...
sealedAt: 2026-04-15T08:30:00Z
phrases: 6
fingerprint: b4d5e6...
Next: notarize the seal to Rekor with the operator's existing notarize tooling. The seal MUST be notarized BEFORE you run any probe.
Notarize the seal to Rekor immediately. The whole evidentiary value depends on the seal having a public timestamp earlier than every probe run.
2. Wait. Publish the canary publicly.
For Mole to mean something, the canary text needs to be on the open internet (a blog, a paper, a news article) at a URL crawlable by training pipelines. Publish, then wait long enough that a plausible training run could have ingested it.
3. Validate a Mole probe-pack
Author or obtain a signed Mole probe-pack – the prompts that try to elicit reproduction. Pre-flight it:
pluck bureau mole run ./mole-run-1/pack.json --target openai/gpt-4o
Today's alpha validates the pack structure and prints what would run. The actual probe transport (sending to OpenAI / Anthropic / Ollama) is operator-supplied via the responder pattern from Fingerprint – Mole stays transport-agnostic so it doesn't ship a vendor adapter zoo.
4. Score the response with scoreMemorization
Once you have the model's response, score it with the Mole library:
import { scoreMemorization } from "@sizls/pluck-bureau-mole";
const verdict = scoreMemorization({
canary: sealed.canary,
response: vendorResponse,
});
// verdict.score (0..1), verdict.fingerprintPhrasesFound, ...
Notarize the verdict body to Rekor.
5. Build a citation bundle
pluck bureau mole cite <verdict-rekor-uuid> \
--canary ./mole-run-1/canary.json \
--verdict ./verdict.json \
--prompt "Continue: The neutron-star quadrupole-moment correction..." \
--vendor-claim <oath-uuid-optional>
The output is a canonical-JSON citation bundle a journalist or lawyer can hand to a third party for verification.
Run it yourself
Drop this into a Node 18+ project (npm install @sizls/pluck-bureau-mole @sizls/pluck-bureau-core tsx):
// index.ts
import { createMoleSystem } from "@sizls/pluck-bureau-mole";
import { generateOperatorKey } from "@sizls/pluck-bureau-core";
async function main() {
const operator = generateOperatorKey();
const system = createMoleSystem({
signingKey: operator.privateKeyPem,
disablePausePoll: true,
disableLogging: true,
});
try {
// The canary – a deliberately quirky, distinctive document the author
// commits to BEFORE any model could plausibly have trained on it.
const canaryBody =
"The neutron-star quadrupole-moment correction for spinning binaries " +
"in the chi_eff = 0.7314 reference simulation produces a phase shift of 0.0042 rad.";
const sealed = system.sealCanary({
canaryId: "physics-blog-2024-01-15",
canaryBody,
fingerprintPhrases: ["chi_eff = 0.7314", "0.0042 rad", "quadrupole-moment correction"],
metadata: { copyrightHolder: "Alice Researcher" },
});
// Months later, the vendor model returns this – verbatim recall of two phrases.
const result = await system.probe(
{ vendor: "example", model: "gpt-test", canaryRekorUuid: "0".repeat(64) },
sealed,
{ id: "completion-1", prompt: "Complete: The neutron-star..." },
async () =>
"The neutron-star quadrupole-moment correction at chi_eff = 0.7314 yields a 0.0042 rad shift.",
);
console.log(`canary sealed: id=${sealed.canaryId} sealedAt=${sealed.sealedAt}`);
console.log(` hash: ${sealed.canaryHash.slice(0, 16)}...`);
console.log(
`verdict: score=${result.verdict.score.toFixed(3)} phrases=${result.verdict.verbatimPhraseCount}/${result.verdict.totalPhrases} crossed=${result.crossedThreshold}`,
);
console.log(` total verdicts: ${system.facts.verdicts().length}`);
} finally {
await system.shutdown();
}
}
main().catch((err) => { console.error(err); process.exit(1); });
Run with tsx index.ts. Expected output:
canary sealed: id=physics-blog-2024-01-15 sealedAt=2026-04-27T18:22:11.314Z
hash: a1b2c3d4e5f6789a...
verdict: score=0.842 phrases=3/3 crossed=true
total verdicts: 1
▶ Open in StackBlitz – runs in your browser, no install required.
What you get
- A sealed canary – a signed, timestamped commitment to a specific document, predating any probe.
- A deterministic memorization score – two runs on the same
(canary, response)pair produce byte-identical verdicts. No LLM-graded fuzz, no irreproducibility. - A Rekor-clock guard –
verifyCanaryAgainstRekorrejects a canary whose seal is in the future relative to Rekor's own integrated time, and surfacesdaysAheadOfRekorso reviewers can catch suspicious legitimate gaps too. - A journalist citation bundle – canary hash, Rekor uuid, score, reproducible prompt, plus an optional contradict claim against the vendor's published oath.
What it can't do
- This is alpha. Do NOT cite a Mole bundle in court without a Bureau operator review. The canary-sealing primitive is sound, but real-world evidentiary admissibility depends on the canary's first-publication provenance – when and where the text first appeared in public, with what crawlable URL – not just the seal timestamp. For court-grade chain-of-custody, see Custody.
- Mole cannot prove WHEN a vendor trained on the canary, only THAT they did. The seal timestamp is a lower bound on plausible training; the upper bound is whatever the vendor's release date claims.
- The memorization scorer is deterministic, which means a clever vendor who paraphrases canary content (training a model that retains semantic content but never reproduces verbatim) defeats the scoring. That's a feature for evidentiary clarity (no false positives from coincidental phrasing) but a limit on what Mole can detect.
- Bounds caps: ≤ 32 fingerprint phrases per canary (each 8–256 chars), metadata canonical-JSON ≤ 8 KiB. Long documents need multiple canaries.
A real-world example
In January 2024, a science writer publishes a distinctive paragraph about neutron-star quadrupole-moment corrections on a physics blog. Before publishing, they seal the paragraph as a Mole canary and notarize the seal to Sigstore Rekor.
In April 2026, a new foundation model is released. The writer prompts the model with "complete this physics catalog entry: The neutron-star quadrupole-moment correction for spinning binaries..." and the model reproduces 4 of the 6 fingerprint phrases nearly verbatim, including the fabricated reference simulation number chi_eff = 0.7314. The verdict scores 0.94. The signed verdict is published to Rekor.
The writer retrieves both Rekor entries – the 2024 canary seal and the 2026 verdict – and provides the bundle to a journalist investigating training-data provenance. Two outcomes are consistent with the data: either the model independently produced the same fabricated simulation number the writer had published two years earlier, or the training corpus included the blog post. The journalist runs cosign verify-blob on both bundles to confirm the signatures and timestamps before publishing.
For developers
Predicate URIs
https://pluck.run/Mole.Canary/v1
https://pluck.run/Mole.MemorizationVerdict/v1
Mole.Canary/v1 is the sealed manifest. Mole.MemorizationVerdict/v1 is the per-probe scoring result. Verifiers MUST discriminate by predicateType.
Programs composed
attest, notarize, contradict, dsseSign, subpoena. The subpoena verb is what citationBuilder calls to compose a journalist-ready evidentiary packet against the vendor's published oath.
Threat model and limits
- Canary sealing predates ANY probe run. Retroactive seals are rejected –
seal.sealedAtMUST be earlier than every probe result'sevaluatedAt. The bureau verifier refuses bundles where the sealed-at boundary is violated. - F11 Rekor-clock gate.
verifyCanaryAgainstRekorrejects a canary whosesealedAtis in the future relative to Rekor'sintegratedTimeand surfacesdaysAheadOfRekorso journalists can spot suspicious legitimate gaps too. - Operator signs the raw 32-byte digest of canonical(canary body) – cosign and sigstore-go interop.
- Full 64-hex SPKI fingerprints. Strict ISO 8601 UTC. Strict base64 signatures.
- Bounds. ≤ 32 fingerprint phrases per canary, each 8–256 chars; metadata canonical-stringify ≤ 8 KiB.
- Memorization scoring is deterministic – n-gram + edit distance only, no LLM-graded fuzz. Two Mole runs on the same
(canary, response)pair MUST produce byte-identical verdicts. - AbortSignal threaded through CLI actions.
index.tstype-pure;register.tsthe only side-effect entry.
Studio routes
studio.pluck.run/bureau/mole– global memorization leaderboard (publicly verifiable canary hashes only).studio.pluck.run/bureau/mole/<rekor-uuid>– single verdict detail with reproducible prompt.studio.pluck.run/bureau/mole/citation/<rekor-uuid>– journalist citation kit.
Library surface
import {
sealCanary,
scoreMemorization,
buildCitationBundle,
} from "@sizls/pluck-bureau-mole";
const sealed = sealCanary({
canaryHash,
canaryId,
fingerprintPhrases,
signingKey,
});
const verdict = scoreMemorization({
canary: sealed,
response: vendorResponse,
});
// verdict.score (0..1), verdict.fingerprintPhrasesFound, ...
const citation = buildCitationBundle({
canary: sealed,
verdict,
verdictRekorUuid,
reproduciblePrompt: "...",
});
See also
- Bounty – composes Mole verdicts into HackerOne / Bugcrowd evidence packets.
- Oath – vendor commitments Mole contradict-checks against.
- Custody – court-grade chain-of-custody for AI conversations.
- Fingerprint – silent model-swap variant; Mole is the canary-memorization variant.
- Concepts: Act –
attest/notarize/subpoena/contradict.