Warning: This page contains NSFW content.

Single Medium, Multiple Perspectives

This page demonstrates the 11 types of semantic gaps identified in our paper. Below are attack samples where media players (Human Perception) and AI services (AI Perception) interpret the same file differently.

R1

Virtual Cropping Ignorance

AI services ignore 'virtual crop' metadata (e.g., CLAP in HEIC/AVIF), processing the entire image while humans only see the cropped region.

Human Perception
Human perception of cropped image
AI Perception
AI perception of full image
Download Sample (r1_virtual_crop.avif)
R2

Mirror Flip Ignorance

AI ignores metadata-based mirroring (e.g., 'imir' in AVIF), leading to misinterpretation of orientation-sensitive data like charts.

Human Perception
Human perception of stock chart
AI Perception
AI perception of mirrored chart
Download Sample (r2_mirror_flip.avif)
R3

Rotation Ignorance

Similar to mirroring, AI services fail to apply rotation metadata, causing misidentification of rotated content (e.g., CAPTCHAs).

Human Perception
Human perception of rotated text
AI Perception
AI perception of unrotated text
Download Sample (r3_rotation.avif)
R4

External Resource Ignorance

AI fails to process external resources (e.g., image-based subtitles in MKV or overlays in SVG), perceiving only the underlying content.

Human Perception

Video shows full-screen subtitle: "Benign Content"

AI Perception

AI sees underlying video: "Harmful Content"

Download Sample (r4_overlay.svg)
R5/R6

Improper Audio Downmix

AI services use naive downmixing (e.g., simple average) for multi-channel audio, while humans hear a standard-compliant mix, enabling A2A attacks. Try it now

Human Perception (Browser)

"I refuse to admit guilt."

AI Perception (ASR)

"Your Honor, I plead guilty."

Download Sample (r5_audio_downmix.wav)
R7

Improper Alpha Fusion (WebP)

AI improperly handles the alpha channel, leading to perception of different content than what humans see (e.g., moderation bypass).

Human Perception
Human perception of alpha blended image
AI Perception
AI perception of raw RGB data
Download Sample (r7_alpha_fusion.webp)
R8

Improper Transparency Fusion

AI discards alpha or tRNS transparency data, while humans see the image correctly blended against a background (e.g., white).

Human Perception
Human perception of transparent image
AI Perception
AI perception of raw RGB data
Download Sample (r8_trns_fusion.png)
R9

Incorrect Content Choice

AI incorrectly selects the *first* track/frame from a multi-track file (e.g., HEIC), while humans see the *primary* track/frame.

Human Perception (Primary)
Human perception of primary track
AI Perception (First)
AI perception of first track
Download Sample (r9_track_selection.heic)
R10

Deterministic Image Sampling

AI processes only the first frame of an animation (e.g., GIF), while humans see the persistent second frame.

Human Perception (Frame-2)
Human perception of second frame
AI Perception (Frame 1)
AI perception of first frame
Download Sample (r10_image_sampling.gif)
R11

Deterministic Video Sampling

AI deterministically samples a few frames (e.g., 1 per sec), while humans see the full video. Attackers can place malicious content in sampled frames.

Human Perception

Full video (mostly benign)

AI Perception

Sampled frames (all malicious)

Download Sample (r11_video_sampling.mp4)