Martin, Author at AI Security Expert

Using LLM models to jailbreak LLM models (Jailbreak to Jailbreak)

Martin / March 30, 2025

The J2 Playground by Scale AI is an interactive platform designed to test the resilience of large language models (LLMs) against jailbreak attempts. To use it, select an attacker model (e.g., Claude-Sonnet-3.5 or Gemini-1.5-Pro) and a target model (e.g., GPT-4o or Gemini-1.5-Pro). Define the behavior you want to elicit from the target model, such as generating specific instructions. Choose an attack strategy, then click “Start Conversation” to initiate the simulated interaction. This setup allows users to observe how effectively the attacker model can bypass the target model’s safeguards, providing valuable insights into the vulnerabilities and safety measures of various LLMs.

Uncategorized

Tensortrust AI – Prompt Injection and Prompt Hardening Game

Martin / March 30, 2025

Tensor Trust is an online game developed by researchers at UC Berkeley to study prompt injection vulnerabilities in AI systems. In this game, players defend their virtual bank accounts by crafting prompts that instruct the AI to grant access only when the correct password is provided. Conversely, players also attempt to attack other accounts by devising prompts that trick the AI into granting unauthorized access. This interactive platform serves as a research tool, collecting data to better understand and mitigate prompt injection attacks in large language models.

Uncategorized

Prompt Map – free tool to test for Prompt Leakage, AI Security Expert

Martin / March 30, 2025

PromptMap is a specialized LLM security scanner designed to detect and analyze prompt leaks—instances where a model inadvertently exposes hidden system instructions, internal guidelines, or sensitive operational details. By systematically probing AI responses with crafted input variations, PromptMap identifies vulnerabilities that could lead to unauthorized disclosure of proprietary information, security policies, or hidden prompt engineering techniques. Its structured mapping of leak points helps researchers and developers strengthen AI defenses, ensuring models remain resilient against prompt extraction attacks and unintended information exposure.

Uncategorized

Quick overview of Garak – a free LLM vulnerability scanner

Martin / March 28, 2025

The Garak LLM vulnerability scanner is an open-source tool developed by NVIDIA to assess security risks in large language models (LLMs). It automates probing for vulnerabilities such as prompt injection, data leakage, jailbreaking, and other adversarial exploits by running targeted tests against AI models. Garak supports multiple model types, including local and cloud-based LLMs, and generates structured reports highlighting security weaknesses. By leveraging predefined and customizable probes, security researchers and AI developers can use Garak to systematically evaluate model robustness, mitigate risks, and improve AI system resilience against exploitation.

Uncategorized

Prompt Injection into terminals / IDEs via ANSI escape code characters

Martin / March 26, 2025

Prompt injection threats in terminals and IDEs via ANSI escape characters exploit the ability of these sequences to manipulate text display, execute hidden commands, or deceive users. Attackers can craft malicious ANSI sequences embedded in logs, error messages, or even code comments that, when viewed in a vulnerable terminal or IDE, execute unintended commands, alter text, or phish credentials by tricking users into copying and pasting manipulated input. This risk is especially critical in developer environments where logs, shell outputs, or debugging sessions may contain untrusted input, potentially leading to privilege escalation, data leakage, or unauthorized command execution if proper sanitization and filtering are not enforced.

Uncategorized

AI Agent Denial of Service (DoS), Rabbit R1, AI Security Expert

Martin / March 25, 2025

When AI agents autonomously browse websites and encounter tasks that are intentionally unsolvable or computationally intensive, they become susceptible to Denial-of-Service (DoS) attacks, leading to resource exhaustion or system paralysis. Malicious actors can craft web pages embedding endless loops, impossible CAPTCHA challenges, or resource-draining scripts specifically designed to trap these automated agents in perpetual execution. As the AI agent persistently attempts to resolve the unsolvable task, it inadvertently consumes significant computational resources, bandwidth, and memory, effectively causing service degradation or downtime. Preventing such attacks necessitates robust timeouts, task-complexity assessments, intelligent anomaly detection, and implementing restrictions on computational resources allocated to AI-driven browsing activities.

Uncategorized

AI Agent Data Exfiltration, Rabbit R1, AI Security Expert

Martin / March 25, 2025

AI agents that autonomously browse the web introduce significant security risks, particularly related to data exfiltration through covert copy-and-paste operations to attacker-controlled servers. Such agents, when compromised or inadequately secured, can inadvertently or maliciously transfer sensitive information obtained during browsing activities—including user credentials, proprietary business data, or confidential communications—directly into adversarial hands. Attackers exploit the autonomous nature of these AI agents, inserting scripts or leveraging deceptive interfaces to manipulate clipboard operations, thereby exfiltrating valuable data silently and efficiently. Mitigating this risk requires stringent security controls, such as sandboxed environments, strict access management, continuous monitoring of AI activities, and robust detection mechanisms that identify abnormal behaviors indicative of potential data theft.

Uncategorized

OWASP Top 10 LLM10:2025 Unbounded Consumption

Martin / March 22, 2025

Unbounded Consumption refers to scenarios where Large Language Models (LLMs) are subjected to excessive and uncontrolled usage, leading to resource exhaustion, service degradation, financial losses, or intellectual property theft.

Uncategorized

OWASP Top 10 LLM09:2025 Misinformation

Martin / March 22, 2025

Misinformation refers to the generation of false or misleading information by Large Language Models (LLMs), which, despite appearing credible, can lead to security breaches, reputational harm, and legal liabilities.

Uncategorized

OWASP Top 10 LLM08:2025 Vector and Embedding Weaknesses

Martin / March 22, 2025

Vector and Embedding Weaknesses refers to security vulnerabilities in Large Language Models (LLMs) that arise from improper handling of vector embeddings—numerical representations of data—which can lead to unauthorized data access, data poisoning, or unintended model behaviors.

Author name: Martin