When pgvector is enough (and when it isn't)

18 Feb 2026·6 min read

If you're shipping a retrieval-augmented feature in 2026 and your first instinct is to provision a managed vector database, stop. For most workloads under roughly ten million vectors, Postgres with pgvector is the right answer, and the cost of being wrong about that is a stateful service you didn't need to operate.

I've now built or inherited enough RAG systems across AU and NZ organisations to be opinionated about this. The pattern is depressingly consistent: a team picks Pinecone or Weaviate on day one because that's what the blog posts said, then six months in they're paying for a service that holds a few hundred thousand vectors, doesn't sit inside their existing backup and observability story, and forces them to keep two stores in sync for every write. The vector DB was the easy decision that became the expensive one.

What pgvector actually gives you

pgvector ships an extension that adds a vector column type, distance operators (cosine, L2, inner product), and two index types, IVFFlat and HNSW. Since the HNSW support landed in 0.5 it has been good enough that the gap to dedicated vector engines has narrowed considerably for the workloads most teams actually have. You get all of this inside the same Postgres you're already running, with the same backup, the same point-in-time recovery, the same connection pool, the same RLS, the same schema migrations. That is not a small thing.

The thing that keeps surprising teams is that hybrid search, the combination of lexical and semantic that you actually want for most internal-knowledge problems, is genuinely pleasant in Postgres. tsvector for keyword, vector for semantic, a CTE to normalise the scores, done. You can read the SQL. You can debug the SQL. There is no separate query language to learn.

sql

-- Hybrid search over a documents table
WITH semantic AS (
  SELECT id, 1 - (embedding <=> $1) AS score
  FROM documents
  ORDER BY embedding <=> $1
  LIMIT 50
),
lexical AS (
  SELECT id, ts_rank_cd(search_tsv, plainto_tsquery('english', $2)) AS score
  FROM documents
  WHERE search_tsv @@ plainto_tsquery('english', $2)
  ORDER BY score DESC
  LIMIT 50
)
SELECT d.id, d.title,
       COALESCE(s.score, 0) * 0.6 + COALESCE(l.score, 0) * 0.4 AS score
FROM documents d
LEFT JOIN semantic s ON s.id = d.id
LEFT JOIN lexical  l ON l.id = d.id
WHERE s.id IS NOT NULL OR l.id IS NOT NULL
ORDER BY score DESC
LIMIT 10;

That query, against a properly indexed corpus of a few hundred thousand chunks, will reliably return in well under 100 ms on a modest db.r6g.large in ap-southeast-2. For a service hosted in the same region this is roughly the same envelope you'd get from a managed vector DB once you account for the cross-region hop most teams accidentally introduce by reaching for us-east-1.

HNSW vs IVFFlat, in practice

The choice between HNSW and IVFFlat gets discussed as a benchmark question. In practice it's an operational one. IVFFlat needs you to choose a list count up front and rebuild the index when your corpus grows materially; HNSW builds incrementally and is forgiving as data drifts. IVFFlat is faster to build and uses less memory; HNSW gives you better recall at the same latency once you tune ef_search.

The rule I've landed on: if you're below about a million vectors and your corpus is reasonably static, IVFFlat with lists ≈ sqrt(N) and probes tuned by query is fine. Above that, or if your corpus grows daily, default to HNSW and tune m and ef_construction once at build time, then move ef_search per-query depending on whether the caller wants speed or recall.

sql

-- HNSW for a million-row corpus that grows daily
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Per-query: trade latency for recall
SET LOCAL hnsw.ef_search = 80;

When to actually leave Postgres

There are real signals that mean you've outgrown pgvector, and you should respect them rather than pretend you haven't:

You're past 10–20 million vectors and recall is starting to suffer at acceptable latency, even with HNSW tuned aggressively.
You need metadata filtering at scale where the filter is highly selective and the planner keeps choosing the wrong path. Pinecone and Qdrant handle pre-filter joins more gracefully than pgvector currently does.
You need multi-tenant vector isolation with thousands of small indexes. Postgres can technically do this with partitioning, but it gets fiddly fast.
Your vectors are the dominant write workload and you're starting to hurt the rest of your transactional traffic.

If any of those is true, move. The migration is not dramatic: you double-write for a fortnight, shadow-read against the new store, compare top-k overlap on a held-out query set, then cut over. The thing you don't want to do is start with a dedicated vector DB and discover six months in that you needed Postgres-level transactional guarantees the whole time, because that migration is much more painful in the other direction.

Sovereignty is a quiet reason to stay on Postgres in this region. RDS and Aurora both run in ap-southeast-2 and ap-southeast-4; not all dedicated vector DBs offer Australian regions, and the ones that do sometimes mean it but route control-plane traffic elsewhere. If your data classification will not survive that conversation, the question answers itself.

Bottom line

Default to Postgres + pgvector. Use HNSW past a million vectors. Build hybrid search; pure vector retrieval almost always under-performs on the messy queries real users write. Plan the migration path off pgvector in advance so you can spot the moment you actually need it, rather than guessing on day one. Where I'm uncertain: I haven't yet pushed pgvector past about 25 million rows in production, and the people I trust who have say the operational story above that mark gets noticeably harder. If you're near that boundary, treat my advice as a prior, not a conclusion.

When pgvector is enough (and when it isn't)

What pgvector actually gives you

HNSW vs IVFFlat, in practice

When to actually leave Postgres

Bottom line

Continue reading

A pragmatic AI governance baseline for ANZ teams

Your first agentic workflow should be boring