Favicon of pgvector

pgvector

Open-source Postgres extension for vector storage and similarity search. Use your existing database for RAG.

Screenshot of pgvector website

pgvector is an open-source PostgreSQL extension that adds vector storage and similarity search capabilities directly to Postgres. Rather than introducing a separate specialized database into a stack, pgvector lets developers store embeddings alongside their existing relational data and query them using standard SQL.

At its core, pgvector adds a new vector data type to PostgreSQL, along with indexing support and distance functions for performing approximate and exact nearest-neighbor searches. This means teams can build Retrieval-Augmented Generation (RAG) pipelines, semantic search, and recommendation systems without leaving the database they already run in production.

The extension supports three distance metrics: L2 (Euclidean), inner product, and cosine distance — covering the full range of what most embedding models require. For indexing, pgvector supports both exact k-nearest neighbor search (no index required) and approximate search via IVFFlat and HNSW indexes, the latter offering strong recall/speed tradeoffs at scale.

In the AI ecosystem, pgvector sits in an interesting position compared to dedicated vector databases like Pinecone, Weaviate, Qdrant, or Chroma. Purpose-built vector stores often offer more advanced filtering, higher query throughput at extreme scale, or managed cloud infrastructure out of the box. pgvector trades some of that specialization for a significant operational advantage: zero additional infrastructure. If a project already runs PostgreSQL — as the majority of production web applications do — adding vector search is a single CREATE EXTENSION vector; command.

This makes pgvector particularly compelling for early-stage AI features, internal tools, and applications where co-locating vector search with relational data reduces complexity. Foreign key constraints, joins between embeddings and user records, transactional consistency across vector and relational writes — these are genuinely hard problems in a multi-database setup that pgvector eliminates by design.

The project is actively maintained with 20,000+ GitHub stars as of early 2026, and has been adopted broadly across the Postgres ecosystem. Major managed Postgres providers including Supabase, Neon, and Amazon RDS now support pgvector natively, making it accessible without requiring self-hosted Postgres. Client library support spans Python, JavaScript, Ruby, Go, Java, and most languages with a Postgres driver.

For teams building RAG applications specifically, pgvector pairs well with frameworks like LangChain, LlamaIndex, and pgvector's own first-party integrations. Embeddings from OpenAI, Cohere, or open-source models can be inserted as vector columns and queried by semantic distance in the same transaction that reads user preferences or filters by account type — a pattern that is cumbersome to replicate across separate systems.

The main limitation is that pgvector scales within the constraints of a single Postgres instance. At very large vector counts (tens of millions+) or very high query-per-second requirements, dedicated vector databases may offer better horizontal scalability. For most applications, however, pgvector's performance envelope is more than sufficient, and the operational simplicity is a genuine advantage.

Key Features

  • Adds a native vector data type to PostgreSQL for storing embedding vectors of any dimension
  • Supports exact and approximate nearest-neighbor search with L2, inner product, and cosine distance metrics
  • HNSW and IVFFlat indexing for fast approximate similarity queries at scale
  • Full SQL compatibility — join vectors with relational data, filter, paginate, and apply transactions as normal
  • Works with any embedding model output (OpenAI, Cohere, open-source models, etc.)
  • Supported natively on managed Postgres platforms including Supabase, Neon, and Amazon RDS
  • Client integrations for Python, JavaScript, Ruby, Go, Java, and other languages via standard Postgres drivers
  • Compatible with LangChain, LlamaIndex, and other RAG frameworks

Pros & Cons

Pros

  • No additional infrastructure required — runs inside an existing Postgres database
  • Transactional consistency between vector and relational data is free by default
  • Open-source and self-hostable with no vendor lock-in
  • Widely supported on managed Postgres cloud providers
  • Active community with strong ecosystem integrations

Cons

  • Scales vertically within a single Postgres instance; horizontal sharding is not built-in
  • At very large vector counts or high query throughput, purpose-built vector databases may outperform
  • Approximate search quality depends on index tuning (IVFFlat list count, HNSW parameters)
  • No built-in metadata filtering as a first-class feature — filtering is done via SQL WHERE clauses, which can affect index utilization

Pricing

pgvector is fully open-source and free to use under the PostgreSQL License. There is no commercial offering or paid tier for the extension itself. Costs, if any, come from the underlying Postgres infrastructure (self-hosted or managed cloud provider).

Who Is This For?

pgvector is best suited for development teams already running PostgreSQL who want to add semantic search, RAG pipelines, or embedding-based recommendations without introducing a separate vector database. It excels at applications where vector queries need to be combined with relational data — such as filtering results by user, date, or category — and for teams prioritizing operational simplicity over maximum vector query throughput.

Categories:

Share:

Ad
Favicon

 

  
 

Similar to pgvector

Favicon

 

  
  
Favicon

 

  
  
Favicon