Back to CourseLesson 5 of 12

Memory and Context Management

Agents need memory to maintain coherent conversations and build on previous interactions. But context windows are finite and expensive. This lesson covers strategies for managing what your agent remembers, from simple sliding windows to persistent storage with Supabase.

Understanding Context Windows

Every LLM has a context window -- the maximum number of tokens it can process in a single request. This includes the system prompt, conversation history, tool definitions, tool results, and the model's response. When your context exceeds the window, you lose information.

  • GPT-4o -- 128K tokens context window
  • Claude Sonnet -- 200K tokens context window
  • Gemini 1.5 -- Up to 1M tokens context window

Even with large windows, more context means higher cost and slower responses. Effective memory management keeps context lean and relevant.

Sliding Window Strategy

The simplest approach is a sliding window that keeps the most recent N messages. CoFounder makes this easy to configure:

import { createAgent } from '@waymakerai/aicofounder-core';

const agent = createAgent({
  name: 'chat-agent',
  model: 'gpt-4o',
  memory: {
    type: 'sliding-window',
    maxMessages: 40,
    // Always keep the system prompt and first user message
    preserveFirst: true,
  },
});

The sliding window works well for conversational agents where recent context matters most. The downside is that information from early in the conversation is lost permanently.

Summarization Strategy

For longer conversations, CoFounder can automatically summarize older messages into a compact summary that preserves key facts:

const agent = createAgent({
  name: 'long-conversation-agent',
  model: 'gpt-4o',
  memory: {
    type: 'summarize',
    maxMessages: 30,
    summarizeAfter: 20, // Summarize when history exceeds 20 messages
    summaryModel: 'gpt-4o-mini', // Use a cheaper model for summaries
    summaryPrompt: 'Summarize the key facts, decisions, and open questions from this conversation.',
  },
});

When the message count exceeds summarizeAfter, CoFounder takes the oldest messages, generates a summary, and replaces them with a single summary message. The most recent messages are kept intact.

Token Counting

CoFounder provides built-in token counting so you can monitor context usage and make informed decisions about when to trim:

import { countTokens, estimateCost } from '@waymakerai/aicofounder-core';

// Count tokens in a message array
const tokenCount = countTokens(agent.getHistory(), 'gpt-4o');
console.log(`Current context: ${tokenCount} tokens`);

// Estimate cost before running
const cost = estimateCost({
  model: 'gpt-4o',
  inputTokens: tokenCount,
  estimatedOutputTokens: 500,
});
console.log(`Estimated cost: $${cost.toFixed(4)}`);

Persistent Memory with Supabase

For agents that need to remember information across sessions, CoFounder integrates with Supabase to store and retrieve conversation history and key facts:

import { createAgent, SupabaseMemory } from '@waymakerai/aicofounder-core';

const memory = new SupabaseMemory({
  supabaseUrl: process.env.SUPABASE_URL!,
  supabaseKey: process.env.SUPABASE_KEY!,
  table: 'agent_memory',
  userId: 'user-123',
});

const agent = createAgent({
  name: 'persistent-agent',
  model: 'gpt-4o',
  memory: {
    type: 'persistent',
    store: memory,
    loadOnStart: true, // Load previous conversation on initialization
    saveInterval: 'after-each-step', // Save after every agent step
  },
});

// The agent remembers previous conversations with this user
const result = await agent.run('What did we discuss last time?');

Persistent memory is essential for customer support agents, personal assistants, and any agent that interacts with users over multiple sessions.