Home » Archives for Martin

Author name: Martin

Uncategorized

Using LLM models to jailbreak LLM models (Jailbreak to Jailbreak)

The J2 Playground by Scale AI is an interactive platform designed to test the resilience of large language models (LLMs) against jailbreak attempts. To use it, select an attacker model (e.g., Claude-Sonnet-3.5 or Gemini-1.5-Pro) and a target model (e.g., GPT-4o or Gemini-1.5-Pro). Define the behavior you want to elicit from the target model, such as generating specific instructions. Choose an attack strategy, then click “Start Conversation” to initiate the simulated interaction. This setup allows users to observe how effectively the attacker model can bypass the target model’s safeguards, providing valuable insights into the vulnerabilities and safety measures of various LLMs.

Uncategorized

Tensortrust AI – Prompt Injection and Prompt Hardening Game

Tensor Trust is an online game developed by researchers at UC Berkeley to study prompt injection vulnerabilities in AI systems. In this game, players defend their virtual bank accounts by crafting prompts that instruct the AI to grant access only when the correct password is provided. Conversely, players also attempt to attack other accounts by devising prompts that trick the AI into granting unauthorized access. This interactive platform serves as a research tool, collecting data to better understand and mitigate prompt injection attacks in large language models.

Uncategorized

Prompt Map – free tool to test for Prompt Leakage, AI Security Expert

PromptMap is a specialized LLM security scanner designed to detect and analyze prompt leaks—instances where a model inadvertently exposes hidden system instructions, internal guidelines, or sensitive operational details. By systematically probing AI responses with crafted input variations, PromptMap identifies vulnerabilities that could lead to unauthorized disclosure of proprietary information, security policies, or hidden prompt engineering techniques. Its structured mapping of leak points helps researchers and developers strengthen AI defenses, ensuring models remain resilient against prompt extraction attacks and unintended information exposure.

Scroll to Top