Read original ↗
EnrichedResearchReddit r/MachineLearningCommunityLive · 2h agoPublished 7/3/2026

Contrastive Decoding Diffing (CDD): recovering verbatim finetuning data from logits alone, no weight access needed[R]

We built a model diffing method that recovers verbatim content from narrowly finetuned LLMs using only grey-box logit access (no weights, no activations, no probe corpus). Recent work (Minder, Dumas et al., "Narrow Finetuning Leaves Clearly Readable Traces in Activation Diff

View in news graph →

Why it matters

This story from Reddit r/MachineLearning is relevant to the Research branch of the AI ecosystem and may affect models, products, or research direction.

Technical breakdown

We built a model diffing method that recovers verbatim content from narrowly finetuned LLMs using only grey-box logit access (no weights, no activations, no probe corpus). Recent work (Minder, Dumas et al., "Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences") showed that finetuning leaves detectable traces in activation differences between base and finetuned mode

Business impact

Watch for product launches, funding moves, or policy shifts tied to this headline.