Prompt Injection Detection
Prompt injection happens when malicious instructions are hidden inside executable content. In the world of AI, prompt injection can be used to nudge AI agents (like goose) to run unsafe commands that compromise your environment or data.
You can help protect your goose workflows by enabling prompt injection detection. This feature uses pattern matching to detect common attack techniques, including:
- Attempts to delete system files or directories
- Commands that download and execute remote scripts
- Attempts to access or exfiltrate sensitive data like SSH keys
- System modifications that could compromise security
These checks provide a safeguard, not a guarantee. They detect known patterns but cannot catch all possible threats, especially novel or sophisticated attacks.
How Detection Works
When enabled, goose scans tool calls for risky patterns before they run:
- Tool call is intercepted and analyzed - When goose prepares to execute a tool, the security system extracts the tool parameter text and checks it against threat patterns
- Risk is assessed - Detected threats are assigned confidence scores
- Execution pauses - Threats that exceed your configured threshold need your decision
- Security alert appears - The alert displays the confidence level, a description of the finding, and a unique finding ID. For example:
🔒 Security Alert: This tool call has been flagged as potentially dangerous.
Confidence: 95%
Explanation: Detected 1 security threat: Recursive file deletion with rm -rf
Finding ID: SEC-abc123...
[Allow Once] [Deny] - You choose whether to proceed or cancel after reviewing the alert details. Note that:
- Each decision is logged with its finding ID in the goose system logs
- Allowed commands still run with your full permissions
Responding to Alerts:
- Read the explanation to understand what triggered the detection
- Consider your context—does this match what you're trying to do?
- Try rephrasing your request more specifically
- Check the source and be extra cautious with prompts from unknown sources
When in doubt, deny.
Enabling Detection
- goose desktop
- goose config file
- Click the button in the top-left to open the sidebar
- Click
Settings
on the sidebar - Click the
Chat
tab - Toggle
Enable Prompt Injection Detection
to the on setting - Optionally adjust the
Detection Threshold
to configure the sensitivity
Add these settings to your config.yaml
:
security_prompt_enabled: true
security_prompt_threshold: 0.7 # Optional, default is 0.7
Beyond prompt injection detection, goose automatically:
- Warns you before running new or updated recipes
- Warns you when importing recipes that contain invisible Unicode Tag Block characters
- Checks for known malware when installing extensions for locally-run MCP servers
Configuring Detection Threshold
The threshold (0.01-1.0) controls how strict detection is:
Threshold | Sensitivity | Use When |
---|---|---|
0.01-0.50 | Very lenient | You're experienced and understand the risks |
0.50-0.70 | Balanced | General development work (good default) |
0.70-0.90 | Strict | Working with sensitive data or systems |
0.90-1.00 | Maximum | High-security environments |
When the injection prompt detection feature is enabled, the default threshold is 0.7 (recommended for most users).
Lower thresholds mean fewer alerts but might miss threats. Higher thresholds catch more potential issues but may flag legitimate operations. You can control this sensitivity/convenience tradeoff based on your needs.
See Also
- goose Permission Modes - Control goose's autonomy level
- Managing Tool Permissions - Fine-grained tool control