RAG Implementation Patterns

Retrieval-Augmented Generation grounds your agent's responses in real data. Instead of relying solely on the model's training data, RAG retrieves relevant documents and includes them in the prompt context. This lesson covers the full RAG pipeline using Supabase with pgvector, from document ingestion to context-aware generation.

pgvector Setup with Supabase

Supabase provides PostgreSQL with the pgvector extension, giving you a production-ready vector database without additional infrastructure. pgvector stores embeddings as native PostgreSQL columns and supports efficient similarity search with HNSW and IVFFlat indexes.

-- Enable pgvector extension
create extension if not exists vector;

-- Create documents table with embedding column
create table documents (
  id uuid default gen_random_uuid() primary key,
  content text not null,
  metadata jsonb default '{}',
  embedding vector(1536), -- OpenAI ada-002 dimensions
  created_at timestamptz default now()
);

-- Create HNSW index for fast similarity search
create index on documents
  using hnsw (embedding vector_cosine_ops)
  with (m = 16, ef_construction = 64);

-- Similarity search function
create or replace function match_documents(
  query_embedding vector(1536),
  match_threshold float default 0.7,
  match_count int default 5
) returns table (id uuid, content text, metadata jsonb, similarity float)
language sql stable as $$
  select id, content, metadata,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  where 1 - (embedding <=> query_embedding) > match_threshold
  order by embedding <=> query_embedding
  limit match_count;
$$;

Embedding Generation and Document Ingestion

Before documents can be searched, they must be converted to embeddings. CoFounder provides an ingestion pipeline that handles chunking, embedding generation, and storage in a single operation.

import { createEmbeddings, DocumentIngester } from '@waymakerai/aicofounder-core';
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_KEY!);

const ingester = new DocumentIngester({
  embeddingModel: 'text-embedding-3-small',
  store: supabase,
  table: 'documents',
  chunking: {
    strategy: 'recursive',  // Split by paragraphs, then sentences
    maxChunkSize: 500,       // Tokens per chunk
    overlap: 50,             // Token overlap between chunks
  },
});

// Ingest a document
await ingester.ingest({
  content: longDocumentText,
  metadata: { source: 'docs', category: 'api-reference' },
});

// Ingest from URL
await ingester.ingestUrl('https://docs.example.com/api', {
  metadata: { source: 'web' },
  selector: 'article',  // CSS selector for content extraction
});

Hybrid Search

Pure vector search sometimes misses results that contain exact keyword matches, while pure keyword search misses semantically similar content. Hybrid search combines both approaches: use pgvector for semantic similarity and PostgreSQL's full-text search for keyword matching, then merge the results with reciprocal rank fusion.

CoFounder's RAG module implements hybrid search out of the box. It runs both searches in parallel and combines the results using a configurable weighting. For most applications, a 70/30 split favoring semantic search produces the best results, but technical documentation often benefits from higher keyword weight.

Chunking Strategies

How you split documents into chunks dramatically affects retrieval quality. Too small and you lose context. Too large and you waste context window space with irrelevant content. CoFounder supports multiple chunking strategies: fixed-size (simple but breaks mid-sentence), recursive (splits by paragraph, then sentence), semantic (groups related sentences using embeddings), and document-aware (respects headers, code blocks, and list structures).

Context Window Management

After retrieval, you need to fit the relevant chunks into the LLM's context window alongside the system prompt, conversation history, and user query. CoFounder's context manager prioritizes chunks by relevance score, deduplicates overlapping content, and truncates intelligently to maximize the information density within your token budget. It also tracks which chunks were included so you can cite sources in the agent's response.