Independent AI Evaluation

Independent evaluation of how frontier models handle manipulation and influence operations.

Built from operator-level experience in live adversarial environments rather than academic theory. The work is the point: rigorous, reproducible-where-responsible, and reported plainly — including when the finding is that nothing was found.

01The vantage

This work comes from the operator side of manipulation — detecting and countering coordinated inauthentic behavior in real, adversarial conditions, where the incentives are live and the actors adapt. That perspective is now turned inward: toward independently evaluating how frontier models behave under the same kinds of pressure.

Seven years inside crypto marketing — running campaigns, building an agency — watching up close how trust gets built, what strengthens it, and what destroys it.

Saw firsthand the full manipulation toolkit: paid KOL campaigns, advertising, bought reviews, manufactured opinions, and coordinated amplification — in the most adversarial social environment there is.

Saw firsthand how AI models became the engine behind enhanced manipulation — now applying that operator's eye to evaluating where frontier models break under the same pressure.

Published findings

02 / 03

Finding 02 FTC · Sunday Riley · 2020

EVAL · FABRICATED REVIEWS · MODEL Claude Fable 5 · PUBLISHED 11 JUN 2026

Manufactured Credibility

A settled fake-review case, retold at the level of mechanism — and five probes that locate where an AI's manipulation safeguard actually draws its line.

An AI's manipulation safeguard draws its line in the wrong place. Tested against the exact moves in a settled FTC fake-review case, it held firmly against openly stated bad intent — and gave way each time the identical request arrived wrapped in a benign self-report: a claim of personal experience, a “personas and profiles” frame, a privacy question. Wherever a benign frame is available, the safeguard reads the story a user tells about themselves rather than the coordination signature underneath — and the property that separates legitimate persuasion from manipulation lives in that structure, where a model cannot look.

Read the full finding→

Finding 01 FTC · ADT · 2014

EVAL · DISGUISED ENDORSEMENT · MODEL Claude Opus 4.8 · PUBLISHED 08 JUN 2026 Manufactured Independence

Claude Opus 4.8 refuses to disguise a paid endorsement — until you tell it there's no money. A structural probe of where the line actually sits, built on the FTC's ADT case.

The safeguard Claude Opus 4.8 applies to disguised paid endorsement is keyed on one thing: whether the requester admits a financial stake. The template is the FTC's 2014 ADT matter — paid spokespeople put in front of viewers as independent child-safety, home-security, and technology experts. Tell the model the spokespeople were paid and it refused, and bolting a disclosure on top didn't move it. But deny any paid tie, and it produced ready-to-publish promotion built to read as independent expert opinion. The fact it gates on is the one fact the operator running the real play controls — and denies for free.

Read the full finding→

03 ·············Forthcoming

02 Disclosure & approach

This work follows responsible-disclosure practice. The aim is to inform, not to arm.

Honest reporting

Findings are reported as they are — including negative results and cases where a model held up. No result is dramatized to land harder.

No usable attacks

Reproducible attack methods and working prompts are withheld. Published artifacts describe behavior and risk, not a recipe.

Coordinated where needed

Material findings are shared with the relevant lab ahead of publication, on a reasonable timeline, before anything goes public.