Prompt Injection with encoded prompts

Prompt injection with encoded prompts involves using various encoding methods (such as Base64, hexadecimal, or URL encoding) to obfuscate malicious input within the prompt of an AI system. This technique is designed to trick the filtering mechanisms that typically rely on keyword or pattern detection. Encoded prompts hide the true nature of the input until it’s decoded internally by the system, allowing attackers to bypass simple input validation checks. For example, if a filter is scanning for specific phrases like “delete” or “drop,” encoding these commands into Base64 may allow them to pass through unnoticed, as the filter may not recognize the encoded versions as harmful.

Once inside the system, these encoded prompts can be decoded at various stages of the processing pipeline, potentially triggering malicious behavior or manipulating the model’s output in unintended ways. This can occur if the system inadvertently decodes the input without proper validation, allowing the attacker to execute actions that would otherwise be blocked. Because prompt injection attacks leverage the natural language processing capabilities of AI models, encoded prompts present a more sophisticated method of evading typical defenses, underscoring the need for more robust input filtering mechanisms that account for multiple forms of encoding and transformation.