Back to CourseLesson 10 of 12

Error Handling Strategies

Production agents encounter failures constantly -- API rate limits, network timeouts, malformed LLM output, and tool execution errors. Robust error handling is what separates a demo from a production-ready agent.

Error Classification

Not all errors are equal. CoFounder classifies errors into categories so you can handle each appropriately:

  • Retryable -- Rate limits (429), temporary network failures, server errors (500/503). These should be retried with backoff.
  • Recoverable -- Tool validation errors, malformed LLM output. The agent can self-correct on the next step.
  • Fatal -- Authentication failures (401), missing permissions, invalid configuration. These require human intervention.
import { createAgent, ErrorCategory } from '@waymakerai/aicofounder-core';

const agent = createAgent({
  name: 'resilient-agent',
  model: 'gpt-4o',
  errorHandling: {
    classify: (error) => {
      if (error.status === 429) return ErrorCategory.RETRYABLE;
      if (error.status === 401) return ErrorCategory.FATAL;
      if (error.message?.includes('invalid JSON')) return ErrorCategory.RECOVERABLE;
      return ErrorCategory.RETRYABLE; // Default to retryable
    },
  },
  tools: [searchTool, databaseTool],
});

Retry Logic with Backoff

CoFounder provides built-in retry logic with exponential backoff. Configure it per-agent or per-tool:

import { createAgent } from '@waymakerai/aicofounder-core';

const agent = createAgent({
  name: 'retrying-agent',
  model: 'gpt-4o',
  errorHandling: {
    retry: {
      maxRetries: 3,
      initialDelayMs: 1000,
      maxDelayMs: 30000,
      backoffMultiplier: 2,
      retryableStatuses: [429, 500, 502, 503],
    },
    onRetry: (error, attempt) => {
      console.warn(`Retry attempt ${attempt}: ${error.message}`);
    },
  },
  tools: [searchTool],
});

Fallback Models

When a primary model is unavailable or returns errors, CoFounder can automatically switch to a fallback model:

const agent = createAgent({
  name: 'fallback-agent',
  model: 'gpt-4o',
  fallbackModels: [
    { model: 'claude-sonnet-4-20250514', provider: 'anthropic' },
    { model: 'gpt-4o-mini', provider: 'openai' },
  ],
  errorHandling: {
    useFallbackOn: [429, 500, 503],
    onFallback: (fromModel, toModel, error) => {
      console.warn(`Falling back from ${fromModel} to ${toModel}: ${error.message}`);
    },
  },
});

Fallback models are tried in order. If all models fail, the agent raises the final error. Use cheaper or more available models as fallbacks.

Graceful Degradation

Sometimes the best response to an error is a partial result rather than a complete failure. CoFounder's hooks let you implement graceful degradation:

const agent = createAgent({
  name: 'graceful-agent',
  model: 'gpt-4o',
  hooks: {
    onToolError: async (toolName, error, context) => {
      // If search fails, continue with what we have
      if (toolName === 'web_search') {
        return {
          handled: true,
          result: JSON.stringify({
            partial: true,
            message: 'Web search is temporarily unavailable. Answering with available knowledge.',
          }),
        };
      }
      return { handled: false }; // Let other errors propagate
    },
    onStepError: async (error, step, context) => {
      if (step >= context.maxSteps - 1) {
        // On last step, return whatever we have
        return {
          handled: true,
          output: 'I was unable to complete the full analysis, but here is what I found so far: ' +
            context.partialResults.join('\n'),
        };
      }
      return { handled: false };
    },
  },
  tools: [searchTool, databaseTool],
});

User-Friendly Error Messages

End users should never see raw stack traces or technical error codes. Map internal errors to helpful messages:

  • Rate limit errors: "I'm processing many requests right now. Please try again in a moment."
  • Tool failures: "I wasn't able to access that information, but I can try a different approach."
  • Context overflow: "Our conversation has gotten long. Let me summarize what we've discussed and we can continue."
  • Model errors: "I encountered an issue generating a response. Let me try again."