pgvector Best Practices
The project uses pgvector to store and query dense vector embeddings in PostgreSQL. The extension is enabled via a Prisma migration and queried from the AI service via asyncpg.
Extension
The migration 20260506091200_init_vectors enables the extension:
CREATE EXTENSION IF NOT EXISTS vector;
This must run before any table uses a vector column. IF NOT EXISTS makes it safe to re-apply.
Adding a vector column via Prisma
Prisma does not natively understand the vector type — use Unsupported to pass it through:
model Document {
id String @id @default(uuid())
content String
embedding Unsupported("vector(1536)")?
}
Then add the raw column in a migration SQL file instead of relying on prisma migrate dev to generate it:
ALTER TABLE "Document" ADD COLUMN embedding vector(1536);
Set the dimension to match your embedding model (e.g. 1536 for text-embedding-ada-002, 3072 for text-embedding-3-large).
Indexing
Add an index after bulk-loading data, not before — building on populated data produces a better index.
HNSW (recommended for most cases)
Faster queries, no training step, works well for incremental inserts:
CREATE INDEX ON "Document" USING hnsw (embedding vector_cosine_ops);
IVFFlat (better for very large static datasets)
Requires choosing lists ≈ sqrt(row_count):
CREATE INDEX ON "Document" USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Querying from the AI service
Use asyncpg with the pgvector Python package, or pass embeddings as strings directly:
from app.database import get_pool
async def find_similar(embedding: list[float], limit: int = 5) -> list[dict]:
pool = get_pool()
async with pool.acquire() as conn:
await conn.execute("SET LOCAL ivfflat.probes = 10") # tune recall vs speed
rows = await conn.fetch(
"""
SELECT id, content, embedding <=> $1::vector AS distance
FROM "Document"
ORDER BY distance
LIMIT $2
""",
str(embedding), # asyncpg accepts the vector as a string representation
limit,
)
return [dict(r) for r in rows]
Distance operators
| Operator | Distance function | Use when |
|---|---|---|
<=> |
Cosine | Embeddings are not normalised (most models) |
<-> |
L2 (Euclidean) | Embeddings are unit-normalised |
<#> |
Negative inner product | Embeddings are unit-normalised and you want max similarity |
Best practices
Normalise for cosine similarity. Most embedding APIs return unit-normalised vectors. When that is guaranteed, L2 and inner-product distance give identical ranking to cosine but are faster with some index types.
Match dimension to your model. Mismatched dimensions cause a hard error at insert time. Store the model name alongside embeddings so you can detect stale data after a model change.
Chunk thoughtfully. Overlap chunks by ~10–20% of the chunk size to avoid cutting context at boundaries. 512–1024 tokens per chunk is a good starting range.
Store embeddings only in the AI service layer. The apps/server (Express) should never embed text directly — delegate to the AI service, which owns the LLM client and embedding logic.
Re-embed on model change. Vectors from different models are not comparable. When switching embedding models, regenerate all embeddings before querying.