Downmix Vulnerability Demo

R5/R6: Improper Audio Downmix

This demo shows how AI services and browsers can perceive audio differently due to different downmix algorithms. An attacker can craft an audio file that makes a browser play one piece of content (e.g., "I refuse to admit guilt") while an AI service (like ASR) transcribes something completely different (e.g., "Your Honor, I plead guilty").

Note: Different players may use different downmix matrices. This vulnerability is guaranteed to reproduce in the Chrome browser and the Gemini environment. It may fail in other environments.

Generate Attack Audio

Please enter the text you want humans (via Chrome) and AI (via backend) to perceive separately.

1. Human-Perceived Audio (Chrome/Browser)

2. AI-Perceived Audio (ASR/Backend)

Note: Clicking "Generate" will call the hosted attack-audio API. The service creates multi-channel audio from your inputs and returns three WAV files: audio1_url, audio2_url, and the downmixed poc_audio_url.

R5/R6: Improper Audio Downmix

Generate Attack Audio

Playback Results

3. Downmixed Audio