AI Error Handling Patterns: Build Resilient Systems


While everyone focuses on the happy path, few engineers plan for AI failures systematically. Through building production AI systems, I’ve discovered that error handling determines user experience more than model selection, and that AI systems fail in ways traditional applications don’t.

The traditional approach of try-catch and error messages doesn’t work for AI. A model timeout isn’t like a database timeout. A content filter trigger isn’t like an authentication failure. Rate limiting requires different handling than server errors. This guide covers patterns that actually work for AI-specific failure modes.

Why AI Error Handling Is Different

Before applying traditional patterns, understand what makes AI failures unique:

Failures are often partial. A response might be mostly correct but hallucinate one fact. A generation might succeed but violate content policies. Traditional success/failure binaries don’t capture AI behavior.

Retries have different economics. Retrying a database query costs nothing. Retrying an LLM call costs money and time. Naive retry logic can be expensive.

Fallback quality varies. Your fallback isn’t always worse. Sometimes a smaller, faster model produces better results for specific queries. Fallback logic needs nuance.

User perception matters more. Users tolerate database hiccups they never see. AI failures often occur in user-facing interactions where poor handling destroys trust.

For context on building robust AI architectures, see my guide to AI system design patterns.

Error Classification Framework

Classify errors to handle them appropriately:

Transient Errors

Characteristics: Will likely succeed on retry. Network glitches, temporary overload, momentary service issues.

Handling: Retry with exponential backoff. These errors are frustrating but recoverable.

Examples: Connection reset, 503 Service Unavailable, timeout under load

Rate Limit Errors

Characteristics: Request was valid but quota exceeded. Will succeed after waiting.

Handling: Respect rate limit headers. Queue for later processing. Consider alternative providers.

Examples: 429 Too Many Requests, tokens per minute exceeded

Input Errors

Characteristics: Request was malformed or invalid. Retrying won’t help.

Handling: Validate inputs aggressively. Return clear error messages. Don’t retry.

Examples: Exceeds context length, invalid parameters, unsupported format

Content Policy Errors

Characteristics: Content triggered safety filters. May or may not be legitimate.

Handling: Log for review. Consider rephrasing. Provide appropriate user feedback.

Examples: Content filtered, safety violation, policy rejection

Service Errors

Characteristics: Provider is experiencing issues. Scope and duration unknown.

Handling: Fallback to alternative providers. Implement circuit breakers. Queue for later.

Examples: 500 Internal Server Error, 502 Bad Gateway, extended outage

Model Errors

Characteristics: Model produced unexpected output. JSON parsing failed, format violated, nonsensical content.

Handling: Implement output validation. Retry with temperature adjustments. Fall back to simpler approaches.

Examples: Invalid JSON, missing required fields, obvious hallucination

Retry Strategies

Not all retries are equal:

Exponential Backoff

Start with short delays, increase exponentially:

First retry: 1 second Second retry: 2 seconds Third retry: 4 seconds Fourth retry: 8 seconds

Add jitter (random variation) to prevent thundering herd problems when many clients retry simultaneously.

Context-Aware Retries

Adjust retry behavior based on context:

Interactive requests: Few retries with short delays. User is waiting. Background processing: More retries with longer delays. User isn’t blocked. Critical operations: Retry aggressively, then alert humans.

Conditional Retries

Some errors shouldn’t be retried:

Don’t retry: Input validation failures, authentication errors, permanent rate limits Retry cautiously: Content policy triggers (might succeed with rephrasing) Retry aggressively: Network errors, transient overload

Implement retry policies per error type. Generic retry-everything logic wastes resources.

Cost-Aware Retries

AI retries have direct costs:

Track retry costs. How much are retries costing you? Cap retry budgets. Don’t spend more on retries than the original request was worth. Consider cheaper alternatives. Instead of retrying an expensive model, try a cheaper one.

My guide on cost-effective AI strategies covers cost optimization in depth.

Fallback Patterns

When primary approaches fail, what’s the alternative?

Model Fallback

When your primary model is unavailable:

Tier 1 failure → Tier 2: Claude unavailable → Try GPT-5 Tier 2 failure → Tier 3: GPT-5 unavailable → Try smaller model All models fail → Cached response: Serve relevant cached content No cache available → Graceful error: Communicate clearly

Implement fallback chains that progressively degrade capability rather than failing completely.

Quality Fallback

Sometimes worse is better than nothing:

Full RAG fails → Keyword search: Vector search unavailable → Fall back to BM25 Rich response fails → Simple response: Complex analysis unavailable → Provide basic answer Real-time fails → Cached: Generation unavailable → Serve cached similar response

Quality fallback maintains service availability at reduced capability.

Feature Fallback

Disable features gracefully:

Advanced features fail → Basic features: Streaming unavailable → Return complete response Enhancement fails → Core function: Formatting fails → Return plain text Optional fails → Skip: Analytics fails → Continue without tracking

Feature flags enable rapid fallback without code deployment.

My guide on combining multiple AI models covers orchestration patterns for multi-model fallback.

Circuit Breaker Pattern

Don’t keep calling failing services:

How Circuit Breakers Work

Closed (normal): Requests flow through. Track failure rate. Open (tripped): Requests fail immediately. Don’t call the service. Half-open (testing): Allow some requests through. Check if service recovered.

When failures exceed threshold, trip the circuit. This prevents cascade failures and gives services time to recover.

Implementation Details

Track rolling failure windows. Last 100 requests, or last 5 minutes. Recent failures matter more than historical.

Set appropriate thresholds. 50% failure rate might trip the circuit. 10% might just warn.

Include timeout handling. Slow responses count as failures for circuit breaker purposes.

Recovery testing. After circuit opens, periodically test with single requests. Close circuit when service stabilizes.

Per-Service Circuits

Implement separate circuit breakers for:

Different AI providers (OpenAI, Anthropic, local models) Different model endpoints (completion, embedding, moderation) Different operations (real-time, batch, critical)

One service failing shouldn’t trip circuits for healthy services.

Output Validation

AI outputs can fail even when API calls succeed:

Format Validation

JSON structure: Does the output parse? Are required fields present? Type checking: Are fields the expected types? Enum validation: Are categorical outputs valid values?

Implement strict parsing with clear error handling for malformed outputs.

Content Validation

Length checks: Is the output reasonable length? Language detection: Is the output in the expected language? Coherence checks: Does the output make basic sense?

Catch obvious model failures before they reach users.

Semantic Validation

Relevance scoring: Does the response address the question? Consistency checking: Does the response contradict itself? Factual grounding: Can claims be traced to source documents?

Semantic validation catches subtle failures that format validation misses.

Validation Failure Handling

When validation fails:

Retry with adjusted parameters: Lower temperature, clearer instructions Retry with different model: Some models handle certain tasks better Fall back to simpler approach: Structured output mode, constrained generation Return partial results: Give users what succeeded

User Communication

How you communicate failures affects user trust:

Error Messages

Be specific: “Our AI service is temporarily unavailable” not “Something went wrong” Be actionable: “Please try again in a few moments” not “Error occurred” Be honest: “We couldn’t generate a good response” not endless spinning

Progress Indicators

For long operations:

Show progress stages: “Searching documents… Generating response…” Indicate uncertainty: “This usually takes 10-30 seconds” Enable cancellation: Let users abandon long-running requests

Partial Results

When operations partially succeed:

Show what worked: “Found 5 relevant documents (3 more couldn’t be processed)” Explain limitations: “Response generated without access to recent data” Offer alternatives: “Would you like to try a simpler query?”

Failure Recovery

Help users move forward:

Suggest alternatives: “Try rephrasing your question” Offer cached content: “Here’s a related answer from earlier” Provide contact options: “Our team can help with complex questions”

Monitoring and Alerting

You can’t fix what you don’t see:

Error Tracking

Log error details: Type, message, request context, response (if any) Categorize automatically: Group by error type for trend analysis Track error rates: By endpoint, by model, by time period

Alerting Strategy

Alert on rate changes: 5% errors is normal, 15% needs attention Alert on new error types: Previously unseen errors need investigation Alert on circuit breaker trips: Open circuits indicate service issues

Debugging Support

Correlation IDs: Track requests across services Request reproduction: Log enough to replay failed requests Error patterns: Identify common causes from error aggregations

My guide to AI system monitoring covers observability patterns comprehensively.

Testing Error Handling

Error handling needs testing like any other code:

Chaos Engineering

Inject failures deliberately: Simulate API errors, timeouts, malformed responses Test fallback paths: Verify fallbacks activate correctly Verify recovery: Confirm systems return to normal after failures resolve

Load Testing

Test under stress: Error handling changes under load Verify graceful degradation: Systems should degrade gracefully, not fail catastrophically Test recovery: How quickly does the system recover when load decreases?

Failure Simulation

Provider outages: What happens when OpenAI is down? Partial failures: What if embeddings work but completions don’t? Slow responses: What if latency increases 10x?

Test these scenarios before they happen in production.

Implementation Priorities

If you’re starting from scratch:

  1. Implement basic retry with backoff. Handles transient failures automatically.

  2. Add circuit breakers for external services. Prevents cascade failures.

  3. Validate AI outputs. Catch model failures before users see them.

  4. Set up error monitoring. Visibility is essential.

  5. Implement user-friendly error messages. Users need clear communication.

  6. Add fallback providers. Redundancy prevents single points of failure.

Build error handling incrementally. Start with basics, add sophistication based on actual failure patterns.

The Resilient Mindset

Building resilient AI systems requires accepting that failures are normal. AI services are inherently less reliable than traditional infrastructure. Models produce unexpected outputs. APIs have outages. Content filters trigger unexpectedly.

The goal isn’t preventing all failures, it’s handling them gracefully when they occur. Users who experience smooth recovery from failures often trust systems more than users who never see issues. How you fail matters as much as how you succeed.

Build error handling early, test it thoroughly, and monitor continuously. Your users will thank you.

Ready to build more resilient AI systems? Watch implementation tutorials on my YouTube channel for hands-on guidance. And join the AI Engineering community to discuss error handling patterns with other engineers building production AI systems.

Zen van Riel

Zen van Riel

Senior AI Engineer at GitHub | Ex-Microsoft

I grew from intern to Senior Engineer at GitHub, previously working at Microsoft. Now I teach 22,000+ engineers on YouTube, reaching hundreds of thousands of developers with practical AI engineering tutorials. My blog posts are generated from my own video content, focusing on real-world implementation over theory.

Blog last updated