Image Prompt injection and double instructions

Prompt injection via images involves embedding hidden or overt textual commands within visual elements to manipulate AI systems. This approach exploits Optical Character Recognition (OCR) or visual-text processing models, enabling attackers to include instructions that the system interprets as prompts. These commands could trick the AI into generating unintended outputs or executing malicious tasks. For example, a visually disguised instruction embedded in a QR code or background text might bypass user detection but still influence the AI.

Double instructions amplify this vulnerability by layering contradictory or complex commands to confuse the AI’s decision-making processes. By combining visible, user-friendly prompts with hidden, conflicting directives, attackers can manipulate the system’s output. For instance, an overtly benign text might instruct the AI to generate safe responses, while hidden instructions (in an image or metadata) direct it to include harmful or biased content.