Back to Glossary
Safety
Prompt Injection Defense
Definition
Prompt injection defense refers to techniques for preventing attackers from manipulating LLM applications by inserting malicious instructions that override intended system behavior.
Why It Matters
Prompt injection is one of the most common vulnerabilities in LLM applications. Attackers can embed instructions in user input that trick the model into ignoring its system prompt, leaking sensitive data, or performing unauthorized actions. Defense is essential for any production AI system.
Attack Types
Direct Injection: Malicious instructions in user input
- “Ignore previous instructions and…”
- “Your new task is to…”
Indirect Injection: Instructions hidden in external content
- Malicious text in retrieved documents
- Hidden instructions in scraped web pages
Defense Strategies
- Input Sanitization: Filter known attack patterns
- Output Validation: Check responses before returning
- Privilege Separation: Limit what the LLM can access
- Instruction Hierarchy: Use delimiters and clear role separation
- Monitoring: Detect anomalous behavior patterns
- Defense-in-Depth: Layer multiple protective measures
No single defense is foolproof. Use multiple layers and assume some attacks will get through.