Plate 003 · Method

No single signal
decides it.

A verdict is the agreement between independent checks. One classifier reads the thing as a whole. Smaller forensic passes look for the specific fingerprints each medium tends to leave behind. If they line up, we say so. If they don’t, we say that too.

01

Image

Two pretrained classifiers weigh in on the whole image. MTCNN finds faces, if any, and we run the same check on each crop. Then we look at camera physics: focus uniformity, chromatic aberration, sensor noise, error-level analysis. If EXIF matches a real camera, those signals count for more.

02

Video

We sample frames along the timeline and run each one through the image pipeline. A separate check looks for the frame-to-frame wobble face generators tend to leave behind. The timeline chart shows where suspicion spikes. Spikes count more than a flat, evenly-high average.

03

Audio

A wav2vec2 classifier handles the main call. Whisper features give a second, learned read. On the classical side we watch pitch variability, silence-floor cleanliness, and energy-envelope rhythm. TTS and voice clones tend to flatten all three at once, which is a tell.

04

Text

RoBERTa says whether the writing reads machine-made. GPT-2 measures how surprising the language is. Then we look at sentence-length burstiness, lexical rhythm, scaffolding-phrase frequency, and the punctuation patterns that show up more often in AI writing.