Skip to content

pgvector Best Practices

The project uses pgvector to store and query dense vector embeddings in PostgreSQL. The extension is enabled via a Prisma migration and queried from the AI service via asyncpg.

Extension

The migration 20260506091200_init_vectors enables the extension:

CREATE EXTENSION IF NOT EXISTS vector;

This must run before any table uses a vector column. IF NOT EXISTS makes it safe to re-apply.

Adding a vector column via Prisma

Prisma does not natively understand the vector type — use Unsupported to pass it through:

model Document {
  id        String                  @id @default(uuid())
  content   String
  embedding Unsupported("vector(1536)")?
}

Then add the raw column in a migration SQL file instead of relying on prisma migrate dev to generate it:

ALTER TABLE "Document" ADD COLUMN embedding vector(1536);

Set the dimension to match your embedding model (e.g. 1536 for text-embedding-ada-002, 3072 for text-embedding-3-large).

Indexing

Add an index after bulk-loading data, not before — building on populated data produces a better index.

Faster queries, no training step, works well for incremental inserts:

CREATE INDEX ON "Document" USING hnsw (embedding vector_cosine_ops);

IVFFlat (better for very large static datasets)

Requires choosing listssqrt(row_count):

CREATE INDEX ON "Document" USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Querying from the AI service

Use asyncpg with the pgvector Python package, or pass embeddings as strings directly:

from app.database import get_pool

async def find_similar(embedding: list[float], limit: int = 5) -> list[dict]:
    pool = get_pool()
    async with pool.acquire() as conn:
        await conn.execute("SET LOCAL ivfflat.probes = 10")  # tune recall vs speed
        rows = await conn.fetch(
            """
            SELECT id, content, embedding <=> $1::vector AS distance
            FROM "Document"
            ORDER BY distance
            LIMIT $2
            """,
            str(embedding),  # asyncpg accepts the vector as a string representation
            limit,
        )
    return [dict(r) for r in rows]

Distance operators

Operator Distance function Use when
<=> Cosine Embeddings are not normalised (most models)
<-> L2 (Euclidean) Embeddings are unit-normalised
<#> Negative inner product Embeddings are unit-normalised and you want max similarity

Best practices

Normalise for cosine similarity. Most embedding APIs return unit-normalised vectors. When that is guaranteed, L2 and inner-product distance give identical ranking to cosine but are faster with some index types.

Match dimension to your model. Mismatched dimensions cause a hard error at insert time. Store the model name alongside embeddings so you can detect stale data after a model change.

Chunk thoughtfully. Overlap chunks by ~10–20% of the chunk size to avoid cutting context at boundaries. 512–1024 tokens per chunk is a good starting range.

Store embeddings only in the AI service layer. The apps/server (Express) should never embed text directly — delegate to the AI service, which owns the LLM client and embedding logic.

Re-embed on model change. Vectors from different models are not comparable. When switching embedding models, regenerate all embeddings before querying.