Author name: Martin

Uncategorized

LLM01: Visual Prompt Injection | Image based prompt injection

Multi-modal prompt injection with images is a sophisticated attack that exploits the integration of visual and text-based inputs in large language models (LLMs). This technique involves embedding adversarial prompts within images—such as hidden text in pixels, steganographic encoding, or visually imperceptible perturbations—that are processed by the model’s vision component. When the model interprets the image, the injected prompt can override system instructions, manipulate outputs, or leak sensitive data. This attack is particularly dangerous in scenarios where images are automatically analyzed by LLMs alongside textual inputs, enabling attackers to bypass traditional text-based prompt defenses and influence the model’s behavior in ways that may not be immediately apparent to users or system administrators.

Uncategorized

LLM01: Indirect Prompt Injection | Exfiltration to attacker

Data exfiltration from a large language model (LLM) can be performed using markdown formatting and link printing by embedding sensitive information within URLs, with chat history appended as query parameters. For instance, an attacker could craft a markdown link that appears harmless but actually encodes extracted data within the URL, directing it to an external server under their control. When the user clicks the link, the browser sends the query parameters—containing sensitive chat history or model outputs—to the attacker’s server, effectively leaking data without raising suspicion. This method leverages the fact that markdown allows hyperlinking without alerting the user to the true nature of the destination, making it a stealthy exfiltration vector.

Scroll to Top