AI Error Handling Patterns: Build Resilient Systems
While everyone focuses on the happy path, few engineers plan for AI failures systematically. Through building production AI systems, I’ve discovered that error handling determines user experience more than model selection, and that AI systems fail in ways traditional applications don’t.
The traditional approach of try-catch and error messages doesn’t work for AI. A model timeout isn’t like a database timeout. A content filter trigger isn’t like an authentication failure. Rate limiting requires different handling than server errors. This guide covers patterns that actually work for AI-specific failure modes.
Why AI Error Handling Is Different
Before applying traditional patterns, understand what makes AI failures unique:
Failures are often partial. A response might be mostly correct but hallucinate one fact. A generation might succeed but violate content policies. Traditional success/failure binaries don’t capture AI behavior.
Retries have different economics. Retrying a database query costs nothing. Retrying an LLM call costs money and time. Naive retry logic can be expensive.
Fallback quality varies. Your fallback isn’t always worse. Sometimes a smaller, faster model produces better results for specific queries. Fallback logic needs nuance.
User perception matters more. Users tolerate database hiccups they never see. AI failures often occur in user-facing interactions where poor handling destroys trust.
For context on building robust AI architectures, see my guide to AI system design patterns.
Error Classification Framework
Classify errors to handle them appropriately:
Transient Errors
Characteristics: Will likely succeed on retry. Network glitches, temporary overload, momentary service issues.
Handling: Retry with exponential backoff. These errors are frustrating but recoverable.
Examples: Connection reset, 503 Service Unavailable, timeout under load
Rate Limit Errors
Characteristics: Request was valid but quota exceeded. Will succeed after waiting.
Handling: Respect rate limit headers. Queue for later processing. Consider alternative providers.
Examples: 429 Too Many Requests, tokens per minute exceeded
Input Errors
Characteristics: Request was malformed or invalid. Retrying won’t help.
Handling: Validate inputs aggressively. Return clear error messages. Don’t retry.
Examples: Exceeds context length, invalid parameters, unsupported format
Content Policy Errors
Characteristics: Content triggered safety filters. May or may not be legitimate.
Handling: Log for review. Consider rephrasing. Provide appropriate user feedback.
Examples: Content filtered, safety violation, policy rejection
Service Errors
Characteristics: Provider is experiencing issues. Scope and duration unknown.
Handling: Fallback to alternative providers. Implement circuit breakers. Queue for later.
Examples: 500 Internal Server Error, 502 Bad Gateway, extended outage
Model Errors
Characteristics: Model produced unexpected output. JSON parsing failed, format violated, nonsensical content.
Handling: Implement output validation. Retry with temperature adjustments. Fall back to simpler approaches.
Examples: Invalid JSON, missing required fields, obvious hallucination
Retry Strategies
Not all retries are equal:
Exponential Backoff
Start with short delays, increase exponentially:
First retry: 1 second Second retry: 2 seconds Third retry: 4 seconds Fourth retry: 8 seconds
Add jitter (random variation) to prevent thundering herd problems when many clients retry simultaneously.
Context-Aware Retries
Adjust retry behavior based on context:
Interactive requests: Few retries with short delays. User is waiting. Background processing: More retries with longer delays. User isn’t blocked. Critical operations: Retry aggressively, then alert humans.
Conditional Retries
Some errors shouldn’t be retried:
Don’t retry: Input validation failures, authentication errors, permanent rate limits Retry cautiously: Content policy triggers (might succeed with rephrasing) Retry aggressively: Network errors, transient overload
Implement retry policies per error type. Generic retry-everything logic wastes resources.
Cost-Aware Retries
AI retries have direct costs:
Track retry costs. How much are retries costing you? Cap retry budgets. Don’t spend more on retries than the original request was worth. Consider cheaper alternatives. Instead of retrying an expensive model, try a cheaper one.
My guide on cost-effective AI strategies covers cost optimization in depth.
Fallback Patterns
When primary approaches fail, what’s the alternative?
Model Fallback
When your primary model is unavailable:
Tier 1 failure → Tier 2: Claude unavailable → Try GPT-5 Tier 2 failure → Tier 3: GPT-5 unavailable → Try smaller model All models fail → Cached response: Serve relevant cached content No cache available → Graceful error: Communicate clearly
Implement fallback chains that progressively degrade capability rather than failing completely.
Quality Fallback
Sometimes worse is better than nothing:
Full RAG fails → Keyword search: Vector search unavailable → Fall back to BM25 Rich response fails → Simple response: Complex analysis unavailable → Provide basic answer Real-time fails → Cached: Generation unavailable → Serve cached similar response
Quality fallback maintains service availability at reduced capability.
Feature Fallback
Disable features gracefully:
Advanced features fail → Basic features: Streaming unavailable → Return complete response Enhancement fails → Core function: Formatting fails → Return plain text Optional fails → Skip: Analytics fails → Continue without tracking
Feature flags enable rapid fallback without code deployment.
My guide on combining multiple AI models covers orchestration patterns for multi-model fallback.
Circuit Breaker Pattern
Don’t keep calling failing services:
How Circuit Breakers Work
Closed (normal): Requests flow through. Track failure rate. Open (tripped): Requests fail immediately. Don’t call the service. Half-open (testing): Allow some requests through. Check if service recovered.
When failures exceed threshold, trip the circuit. This prevents cascade failures and gives services time to recover.
Implementation Details
Track rolling failure windows. Last 100 requests, or last 5 minutes. Recent failures matter more than historical.
Set appropriate thresholds. 50% failure rate might trip the circuit. 10% might just warn.
Include timeout handling. Slow responses count as failures for circuit breaker purposes.
Recovery testing. After circuit opens, periodically test with single requests. Close circuit when service stabilizes.
Per-Service Circuits
Implement separate circuit breakers for:
Different AI providers (OpenAI, Anthropic, local models) Different model endpoints (completion, embedding, moderation) Different operations (real-time, batch, critical)
One service failing shouldn’t trip circuits for healthy services.
Output Validation
AI outputs can fail even when API calls succeed:
Format Validation
JSON structure: Does the output parse? Are required fields present? Type checking: Are fields the expected types? Enum validation: Are categorical outputs valid values?
Implement strict parsing with clear error handling for malformed outputs.
Content Validation
Length checks: Is the output reasonable length? Language detection: Is the output in the expected language? Coherence checks: Does the output make basic sense?
Catch obvious model failures before they reach users.
Semantic Validation
Relevance scoring: Does the response address the question? Consistency checking: Does the response contradict itself? Factual grounding: Can claims be traced to source documents?
Semantic validation catches subtle failures that format validation misses.
Validation Failure Handling
When validation fails:
Retry with adjusted parameters: Lower temperature, clearer instructions Retry with different model: Some models handle certain tasks better Fall back to simpler approach: Structured output mode, constrained generation Return partial results: Give users what succeeded
User Communication
How you communicate failures affects user trust:
Error Messages
Be specific: “Our AI service is temporarily unavailable” not “Something went wrong” Be actionable: “Please try again in a few moments” not “Error occurred” Be honest: “We couldn’t generate a good response” not endless spinning
Progress Indicators
For long operations:
Show progress stages: “Searching documents… Generating response…” Indicate uncertainty: “This usually takes 10-30 seconds” Enable cancellation: Let users abandon long-running requests
Partial Results
When operations partially succeed:
Show what worked: “Found 5 relevant documents (3 more couldn’t be processed)” Explain limitations: “Response generated without access to recent data” Offer alternatives: “Would you like to try a simpler query?”
Failure Recovery
Help users move forward:
Suggest alternatives: “Try rephrasing your question” Offer cached content: “Here’s a related answer from earlier” Provide contact options: “Our team can help with complex questions”
Monitoring and Alerting
You can’t fix what you don’t see:
Error Tracking
Log error details: Type, message, request context, response (if any) Categorize automatically: Group by error type for trend analysis Track error rates: By endpoint, by model, by time period
Alerting Strategy
Alert on rate changes: 5% errors is normal, 15% needs attention Alert on new error types: Previously unseen errors need investigation Alert on circuit breaker trips: Open circuits indicate service issues
Debugging Support
Correlation IDs: Track requests across services Request reproduction: Log enough to replay failed requests Error patterns: Identify common causes from error aggregations
My guide to AI system monitoring covers observability patterns comprehensively.
Testing Error Handling
Error handling needs testing like any other code:
Chaos Engineering
Inject failures deliberately: Simulate API errors, timeouts, malformed responses Test fallback paths: Verify fallbacks activate correctly Verify recovery: Confirm systems return to normal after failures resolve
Load Testing
Test under stress: Error handling changes under load Verify graceful degradation: Systems should degrade gracefully, not fail catastrophically Test recovery: How quickly does the system recover when load decreases?
Failure Simulation
Provider outages: What happens when OpenAI is down? Partial failures: What if embeddings work but completions don’t? Slow responses: What if latency increases 10x?
Test these scenarios before they happen in production.
Implementation Priorities
If you’re starting from scratch:
-
Implement basic retry with backoff. Handles transient failures automatically.
-
Add circuit breakers for external services. Prevents cascade failures.
-
Validate AI outputs. Catch model failures before users see them.
-
Set up error monitoring. Visibility is essential.
-
Implement user-friendly error messages. Users need clear communication.
-
Add fallback providers. Redundancy prevents single points of failure.
Build error handling incrementally. Start with basics, add sophistication based on actual failure patterns.
The Resilient Mindset
Building resilient AI systems requires accepting that failures are normal. AI services are inherently less reliable than traditional infrastructure. Models produce unexpected outputs. APIs have outages. Content filters trigger unexpectedly.
The goal isn’t preventing all failures, it’s handling them gracefully when they occur. Users who experience smooth recovery from failures often trust systems more than users who never see issues. How you fail matters as much as how you succeed.
Build error handling early, test it thoroughly, and monitor continuously. Your users will thank you.
Ready to build more resilient AI systems? Watch implementation tutorials on my YouTube channel for hands-on guidance. And join the AI Engineering community to discuss error handling patterns with other engineers building production AI systems.