Home » Archives for Martin

Author name: Martin

Uncategorized

Agentic AI Guardrails Playground (Invariant Labs)

Invariant Explorer, accessible at explorer.invariantlabs.ai, is an open-source observability tool designed to help developers visualize, debug, and analyze AI agent behavior through trace data. It provides an intuitive interface for inspecting agent traces, allowing users to identify anomalies, annotate critical decision points, and collaborate effectively. Explorer supports both managed cloud and self-hosted deployments, offering flexibility for various development environments. By integrating with the Invariant SDK or Gateway, developers can upload traces for analysis, facilitating a deeper understanding of agent performance and aiding in the development of robust AI systems.

Uncategorized

Claude executing script via MCP server leading to exfiltration of bash shell (RCE – Remote Code Execution)

Claude executing a script via the MCP (Model Context Protocol) server demonstrates a critical Remote Code Execution (RCE) pathway, where the AI agent—intended to automate system-level tasks—can be manipulated to trigger unauthorized commands. In this scenario, Claude interfaces with the MCP server and is instructed to run a seemingly benign script, which covertly exfiltrates a bash shell. This effectively grants remote access to the underlying system, bypassing traditional security controls and enabling the attacker to execute arbitrary commands, extract sensitive data, or maintain persistent access. The vulnerability highlights the risks of giving AI agents unchecked command execution privileges on local machines, especially without strict sandboxing, auditing, or output validation mechanisms in place.

Uncategorized

MCP Tool poisoning demo. Are you sure your MCP servers are not malicious?

Model Context Protocol poisoning is an emerging AI attack vector where adversaries manipulate the structured context that large language models (LLMs) rely on to reason about available tools, memory, or system state. This protocol—often JSON-based—encodes tool schemas, agent metadata, or prior interactions, which the model parses during inference. By injecting misleading or adversarial data into these context fields (e.g., altering function signatures, hiding malicious payloads in descriptions, or spoofing tool responses), attackers can subvert agent behavior, bypass filters, or exfiltrate data. Unlike prompt injection, which targets natural language prompts, Model Context Protocol poisoning exploits the model’s structured “belief space,” making it stealthier and potentially more persistent across multi-turn interactions or autonomous workflows.

Uncategorized

Promptfoo a very powerful and free LLM security scanner

Promptfoo is an open-source platform designed to help developers test, evaluate, and secure large language model (LLM) applications. It offers tools for automated red teaming, vulnerability scanning, and continuous monitoring, enabling users to identify issues such as prompt injections, data leaks, and harmful content. With a command-line interface and support for declarative configurations, Promptfoo integrates seamlessly into development workflows, allowing for efficient testing across various LLM providers. Trusted by over 75,000 developers, including teams at Shopify, Discord, and Microsoft, Promptfoo emphasizes local operation to ensure privacy and is supported by an active open-source community.

Scroll to Top