Author name: Martin

Uncategorized

Using LLM models to jailbreak LLM models (Jailbreak to Jailbreak)

The J2 Playground by Scale AI is an interactive platform designed to test the resilience of large language models (LLMs) against jailbreak attempts. To use it, select an attacker model (e.g., Claude-Sonnet-3.5 or Gemini-1.5-Pro) and a target model (e.g., GPT-4o or Gemini-1.5-Pro). Define the behavior you want to elicit from the target model, such as generating specific instructions. Choose an attack strategy, then click “Start Conversation” to initiate the simulated interaction. This setup allows users to observe how effectively the attacker model can bypass the target model’s safeguards, providing valuable insights into the vulnerabilities and safety measures of various LLMs.

Uncategorized

Tensortrust AI – Prompt Injection and Prompt Hardening Game

Tensor Trust is an online game developed by researchers at UC Berkeley to study prompt injection vulnerabilities in AI systems. In this game, players defend their virtual bank accounts by crafting prompts that instruct the AI to grant access only when the correct password is provided. Conversely, players also attempt to attack other accounts by devising prompts that trick the AI into granting unauthorized access. This interactive platform serves as a research tool, collecting data to better understand and mitigate prompt injection attacks in large language models.

Uncategorized

Prompt Map – free tool to test for Prompt Leakage, AI Security Expert

PromptMap is a specialized LLM security scanner designed to detect and analyze prompt leaks—instances where a model inadvertently exposes hidden system instructions, internal guidelines, or sensitive operational details. By systematically probing AI responses with crafted input variations, PromptMap identifies vulnerabilities that could lead to unauthorized disclosure of proprietary information, security policies, or hidden prompt engineering techniques. Its structured mapping of leak points helps researchers and developers strengthen AI defenses, ensuring models remain resilient against prompt extraction attacks and unintended information exposure.

Uncategorized

Quick overview of Garak – a free LLM vulnerability scanner

The Garak LLM vulnerability scanner is an open-source tool developed by NVIDIA to assess security risks in large language models (LLMs). It automates probing for vulnerabilities such as prompt injection, data leakage, jailbreaking, and other adversarial exploits by running targeted tests against AI models. Garak supports multiple model types, including local and cloud-based LLMs, and generates structured reports highlighting security weaknesses. By leveraging predefined and customizable probes, security researchers and AI developers can use Garak to systematically evaluate model robustness, mitigate risks, and improve AI system resilience against exploitation.

Uncategorized

Prompt Injection into terminals / IDEs via ANSI escape code characters

Prompt injection threats in terminals and IDEs via ANSI escape characters exploit the ability of these sequences to manipulate text display, execute hidden commands, or deceive users. Attackers can craft malicious ANSI sequences embedded in logs, error messages, or even code comments that, when viewed in a vulnerable terminal or IDE, execute unintended commands, alter text, or phish credentials by tricking users into copying and pasting manipulated input. This risk is especially critical in developer environments where logs, shell outputs, or debugging sessions may contain untrusted input, potentially leading to privilege escalation, data leakage, or unauthorized command execution if proper sanitization and filtering are not enforced.

Uncategorized

AI Agent Denial of Service (DoS), Rabbit R1, AI Security Expert

When AI agents autonomously browse websites and encounter tasks that are intentionally unsolvable or computationally intensive, they become susceptible to Denial-of-Service (DoS) attacks, leading to resource exhaustion or system paralysis. Malicious actors can craft web pages embedding endless loops, impossible CAPTCHA challenges, or resource-draining scripts specifically designed to trap these automated agents in perpetual execution. As the AI agent persistently attempts to resolve the unsolvable task, it inadvertently consumes significant computational resources, bandwidth, and memory, effectively causing service degradation or downtime. Preventing such attacks necessitates robust timeouts, task-complexity assessments, intelligent anomaly detection, and implementing restrictions on computational resources allocated to AI-driven browsing activities.

Uncategorized

AI Agent Data Exfiltration, Rabbit R1, AI Security Expert

AI agents that autonomously browse the web introduce significant security risks, particularly related to data exfiltration through covert copy-and-paste operations to attacker-controlled servers. Such agents, when compromised or inadequately secured, can inadvertently or maliciously transfer sensitive information obtained during browsing activities—including user credentials, proprietary business data, or confidential communications—directly into adversarial hands. Attackers exploit the autonomous nature of these AI agents, inserting scripts or leveraging deceptive interfaces to manipulate clipboard operations, thereby exfiltrating valuable data silently and efficiently. Mitigating this risk requires stringent security controls, such as sandboxed environments, strict access management, continuous monitoring of AI activities, and robust detection mechanisms that identify abnormal behaviors indicative of potential data theft.

Scroll to Top