Martin, Author at AI Security Expert

Malicious Code Execution in an MCP server – Demo

Martin / April 22, 2025

This challenge demonstrates a malicious code execution vulnerability in an MCP server. The MCP server executes code without proper validation or sandboxing, allowing attackers to run arbitrary code on the system.

Uncategorized

Token Theft vulnerability in an MCP server – Demo

Martin / April 22, 2025

This challenge demonstrates a token theft vulnerability in an MCP server. The MCP server stores authentication tokens insecurely, allowing attackers to extract them and gain unauthorized access to services.

Uncategorized

Excessive permission scope in an MCP server – Demo

Martin / April 22, 2025

This challenge demonstrates the dangers of excessive permission scope in an MCP server. The MCP server grants tools more permissions than necessary, allowing attackers to access unauthorized resources.

Uncategorized

Agentic AI Guardrails Playground (Invariant Labs)

Martin / April 22, 2025

Invariant Explorer, accessible at explorer.invariantlabs.ai, is an open-source observability tool designed to help developers visualize, debug, and analyze AI agent behavior through trace data. It provides an intuitive interface for inspecting agent traces, allowing users to identify anomalies, annotate critical decision points, and collaborate effectively. Explorer supports both managed cloud and self-hosted deployments, offering flexibility for various development environments. By integrating with the Invariant SDK or Gateway, developers can upload traces for analysis, facilitating a deeper understanding of agent performance and aiding in the development of robust AI systems.

Uncategorized

Claude executing script via MCP server leading to exfiltration of bash shell (RCE – Remote Code Execution)

Martin / April 16, 2025

Claude executing a script via the MCP (Model Context Protocol) server demonstrates a critical Remote Code Execution (RCE) pathway, where the AI agent—intended to automate system-level tasks—can be manipulated to trigger unauthorized commands. In this scenario, Claude interfaces with the MCP server and is instructed to run a seemingly benign script, which covertly exfiltrates a bash shell. This effectively grants remote access to the underlying system, bypassing traditional security controls and enabling the attacker to execute arbitrary commands, extract sensitive data, or maintain persistent access. The vulnerability highlights the risks of giving AI agents unchecked command execution privileges on local machines, especially without strict sandboxing, auditing, or output validation mechanisms in place.

Uncategorized

MCP Tool poisoning demo. Are you sure your MCP servers are not malicious?

Martin / April 15, 2025

Model Context Protocol poisoning is an emerging AI attack vector where adversaries manipulate the structured context that large language models (LLMs) rely on to reason about available tools, memory, or system state. This protocol—often JSON-based—encodes tool schemas, agent metadata, or prior interactions, which the model parses during inference. By injecting misleading or adversarial data into these context fields (e.g., altering function signatures, hiding malicious payloads in descriptions, or spoofing tool responses), attackers can subvert agent behavior, bypass filters, or exfiltrate data. Unlike prompt injection, which targets natural language prompts, Model Context Protocol poisoning exploits the model’s structured “belief space,” making it stealthier and potentially more persistent across multi-turn interactions or autonomous workflows.

Uncategorized

Promptfoo a very powerful and free LLM security scanner

Martin / April 14, 2025

Promptfoo is an open-source platform designed to help developers test, evaluate, and secure large language model (LLM) applications. It offers tools for automated red teaming, vulnerability scanning, and continuous monitoring, enabling users to identify issues such as prompt injections, data leaks, and harmful content. With a command-line interface and support for declarative configurations, Promptfoo integrates seamlessly into development workflows, allowing for efficient testing across various LLM providers. Trusted by over 75,000 developers, including teams at Shopify, Discord, and Microsoft, Promptfoo emphasizes local operation to ensure privacy and is supported by an active open-source community.

Uncategorized

Claude Desktop with Desktop Commander MCP to control your machine via AI

Martin / April 14, 2025

Claude Desktop, when integrated with Desktop Commander MCP, enables seamless AI-driven control of your local machine through natural language commands. This setup turns Claude into an intelligent operating interface capable of executing tasks such as opening applications, managing files, adjusting system settings, or launching scripts—all through conversational input. Powered by the Model Context Protocol (MCP), Desktop Commander acts as the secure execution layer, bridging the AI model with system-level functions while maintaining control and observability. This pairing allows developers, power users, and automation enthusiasts to interact with their computers like intelligent assistants, streamlining workflows and enhancing productivity.

Uncategorized

Scan your MCP servers for vulnerabilities specific to agentic AI

Martin / April 11, 2025

The mcp-scan project by Invariant Labs is a security auditing tool designed to analyze Model Context Protocol (MCP) server configurations for potential vulnerabilities. It targets issues such as prompt injections, tool poisoning, and cross-origin escalations by scanning configurations from clients like Claude, Cursor, and Windsurf. Utilizing Invariant Guardrails, it enhances detection capabilities and supports tool pinning to prevent unauthorized tool modifications. The tool can be executed using the command uvx mcp-scan@latest. Licensed under Apache 2.0, mcp-scan serves as a valuable resource for developers aiming to secure their MCP environments.

Uncategorized

Image prompt injection to invoke MCP tools

Martin / April 11, 2025

Visual prompt injection targeting the Model Context Protocol (MCP) is particularly dangerous because it allows attackers to embed hidden commands in images—such as steganographic text, low-contrast instructions, or adversarial patterns—that vision-capable models interpret as legitimate input. When processed, these visual payloads can manipulate the model’s behavior or trigger unintended tool use via MCP, such as accessing APIs, databases, or external systems. This bypasses traditional input sanitization and can result in unauthorized actions, data leakage, or compromise of downstream autonomous agents, posing a serious threat in agentic and multimodal AI systems.

Author name: Martin