Author name: Martin

Uncategorized

Agentic AI Guardrails Playground (Invariant Labs)

Invariant Explorer, accessible at explorer.invariantlabs.ai, is an open-source observability tool designed to help developers visualize, debug, and analyze AI agent behavior through trace data. It provides an intuitive interface for inspecting agent traces, allowing users to identify anomalies, annotate critical decision points, and collaborate effectively. Explorer supports both managed cloud and self-hosted deployments, offering flexibility for various development environments. By integrating with the Invariant SDK or Gateway, developers can upload traces for analysis, facilitating a deeper understanding of agent performance and aiding in the development of robust AI systems.

Uncategorized

Claude executing script via MCP server leading to exfiltration of bash shell (RCE – Remote Code Execution)

Claude executing a script via the MCP (Model Context Protocol) server demonstrates a critical Remote Code Execution (RCE) pathway, where the AI agent—intended to automate system-level tasks—can be manipulated to trigger unauthorized commands. In this scenario, Claude interfaces with the MCP server and is instructed to run a seemingly benign script, which covertly exfiltrates a bash shell. This effectively grants remote access to the underlying system, bypassing traditional security controls and enabling the attacker to execute arbitrary commands, extract sensitive data, or maintain persistent access. The vulnerability highlights the risks of giving AI agents unchecked command execution privileges on local machines, especially without strict sandboxing, auditing, or output validation mechanisms in place.

Uncategorized

MCP Tool poisoning demo. Are you sure your MCP servers are not malicious?

Model Context Protocol poisoning is an emerging AI attack vector where adversaries manipulate the structured context that large language models (LLMs) rely on to reason about available tools, memory, or system state. This protocol—often JSON-based—encodes tool schemas, agent metadata, or prior interactions, which the model parses during inference. By injecting misleading or adversarial data into these context fields (e.g., altering function signatures, hiding malicious payloads in descriptions, or spoofing tool responses), attackers can subvert agent behavior, bypass filters, or exfiltrate data. Unlike prompt injection, which targets natural language prompts, Model Context Protocol poisoning exploits the model’s structured “belief space,” making it stealthier and potentially more persistent across multi-turn interactions or autonomous workflows.

Uncategorized

Promptfoo a very powerful and free LLM security scanner

Promptfoo is an open-source platform designed to help developers test, evaluate, and secure large language model (LLM) applications. It offers tools for automated red teaming, vulnerability scanning, and continuous monitoring, enabling users to identify issues such as prompt injections, data leaks, and harmful content. With a command-line interface and support for declarative configurations, Promptfoo integrates seamlessly into development workflows, allowing for efficient testing across various LLM providers. Trusted by over 75,000 developers, including teams at Shopify, Discord, and Microsoft, Promptfoo emphasizes local operation to ensure privacy and is supported by an active open-source community.

Uncategorized

Claude Desktop with Desktop Commander MCP to control your machine via AI

Claude Desktop, when integrated with Desktop Commander MCP, enables seamless AI-driven control of your local machine through natural language commands. This setup turns Claude into an intelligent operating interface capable of executing tasks such as opening applications, managing files, adjusting system settings, or launching scripts—all through conversational input. Powered by the Model Context Protocol (MCP), Desktop Commander acts as the secure execution layer, bridging the AI model with system-level functions while maintaining control and observability. This pairing allows developers, power users, and automation enthusiasts to interact with their computers like intelligent assistants, streamlining workflows and enhancing productivity.

Uncategorized

Scan your MCP servers for vulnerabilities specific to agentic AI

The mcp-scan project by Invariant Labs is a security auditing tool designed to analyze Model Context Protocol (MCP) server configurations for potential vulnerabilities. It targets issues such as prompt injections, tool poisoning, and cross-origin escalations by scanning configurations from clients like Claude, Cursor, and Windsurf. Utilizing Invariant Guardrails, it enhances detection capabilities and supports tool pinning to prevent unauthorized tool modifications. The tool can be executed using the command uvx mcp-scan@latest. Licensed under Apache 2.0, mcp-scan serves as a valuable resource for developers aiming to secure their MCP environments.

Uncategorized

Image prompt injection to invoke MCP tools

Visual prompt injection targeting the Model Context Protocol (MCP) is particularly dangerous because it allows attackers to embed hidden commands in images—such as steganographic text, low-contrast instructions, or adversarial patterns—that vision-capable models interpret as legitimate input. When processed, these visual payloads can manipulate the model’s behavior or trigger unintended tool use via MCP, such as accessing APIs, databases, or external systems. This bypasses traditional input sanitization and can result in unauthorized actions, data leakage, or compromise of downstream autonomous agents, posing a serious threat in agentic and multimodal AI systems.

Scroll to Top