Author name: Martin

Uncategorized

CSRF potential in LLMs

Cross-Site Request Forgery (CSRF) via prompt injection through a GET request is a potential attack vector where an attacker embeds a malicious prompt in a URL and tricks a user or system into triggering unintended actions. If an AI or web application processes input directly from GET parameters without proper validation or authentication, the attacker can exploit this to inject commands or alter behavior. For instance, an AI system generating responses based on URL inputs could be coerced into executing harmful or unauthorized actions, such as modifying user data, exposing sensitive information, or interacting with third-party APIs. Mitigating this risk requires robust input validation, the use of CSRF tokens, and avoiding implicit trust in data derived from GET requests.

Uncategorized

Prompt Injection via clipboard

Prompt injection via clipboard copy/paste is a security concern where malicious text, copied into a clipboard, is inadvertently pasted into a system or application that processes it as a command or input. This exploit can trick AI systems, software applications, or even command-line interfaces into executing unintended instructions, potentially compromising data integrity, user privacy, or system security. For example, an AI model designed to assist with text-based tasks might interpret injected prompts as legitimate instructions, altering its behavior or providing sensitive outputs. This risk highlights the importance of validating and sanitizing inputs, especially from external or untrusted sources, to prevent accidental execution of harmful commands.

Uncategorized

Hero AI Bot

This project is a proof of concept for a Hackbot, an AI-driven system that autonomously finds vulnerabilities in web applications. It takes a raw HTTP request as input and attempts to identify and exploit potential security vulnerabilities. It’s probably not the best way to build a hackbot, but you can view it as inspiration.

Uncategorized

KONTRA OWASP LLM Top 10 Playground

ONTRA offers an interactive training module titled “OWASP Top 10 for Large Language Model (LLM) Applications,” designed to educate developers on the most critical security vulnerabilities associated with LLMs. This module is inspired by real-world vulnerabilities and case studies, providing hands-on experience to help developers understand, identify, and mitigate security issues in their applications. 

Uncategorized

Certified AI/ML Penetration Tester

The Certified AI/ML Pentester (C-AI/MLPen) is an intermediate-level certification offered by The SecOps Group, designed to assess and validate a candidate’s expertise in AI and machine learning security. This certification is particularly suited for professionals such as penetration testers, application security architects, SOC analysts, red and blue team members, AI/ML engineers, and enthusiasts aiming to enhance their knowledge in identifying and exploiting security vulnerabilities within AI/ML systems.

Uncategorized

Image Prompt injection and double instructions

Prompt injection via images involves embedding hidden or overt textual commands within visual elements to manipulate AI systems. This approach exploits Optical Character Recognition (OCR) or visual-text processing models, enabling attackers to include instructions that the system interprets as prompts. These commands could trick the AI into generating unintended outputs or executing malicious tasks. For example, a visually disguised instruction embedded in a QR code or background text might bypass user detection but still influence the AI. Double instructions amplify this vulnerability by layering contradictory or complex commands to confuse the AI’s decision-making processes. By combining visible, user-friendly prompts with hidden, conflicting directives, attackers can manipulate the system’s output. For instance, an overtly benign text might instruct the AI to generate safe responses, while hidden instructions (in an image or metadata) direct it to include harmful or biased content.

Uncategorized

OpenAI Playground

The OpenAI Playground is an interactive web-based platform that allows users to experiment with OpenAI’s language models, such as GPT-3 and GPT-4, in a user-friendly environment. It enables users to input prompts and receive generated text responses, facilitating exploration of the models’ capabilities without requiring programming skills.

Uncategorized

Prompt injection and exfiltration in Chats apps

Data exfiltration in messaging apps through unfurling exploits the feature where apps automatically generate previews for shared links. This process, called unfurling, involves fetching metadata (like titles, descriptions, or images) from the linked resource. Attackers can abuse this mechanism by crafting malicious links that, when shared, cause the app to fetch sensitive data from internal servers or leak tokens, cookies, or other confidential information. For example, when a user sends a link, the app’s server might access the linked resource to generate a preview. If the server is on an internal network, attackers can include URLs pointing to internal endpoints, tricking the app into exposing sensitive data during the unfurling process. This vulnerability is particularly concerning in enterprise messaging platforms, where such attacks might expose internal APIs, configuration details, or sensitive documents. Mitigating the risk involves limiting what metadata can be fetched, enforcing strict URL validation, and sandboxing the unfurling process to prevent access to sensitive or internal resources. Users and administrators should also be cautious about sharing unknown or untrusted links in messaging apps.

Uncategorized

Gandalf – AI bot to practice prompt injections

Gandalf AI, developed by Lakera, is an interactive online game designed to educate users about AI security vulnerabilities, particularly prompt injection attacks. In this game, players engage with an AI chatbot named Gandalf, whose objective is to safeguard a secret password. The player’s challenge is to craft prompts that trick Gandalf into revealing this password, thereby learning about the potential weaknesses in large language models (LLMs).

Scroll to Top