Vector Databases for AI Certifications 2026: pgvector, OpenSearch, Pinecone
Embeddings and vector stores quietly took over the 2026 cloud AI exam blueprints. Here is what AWS, Azure, GCP and OCI now test — and the fastest way to drill it.

Table of Contents
Why Vector DBs Are on the Exam Now
Two years ago, vector search was a niche topic. In 2026 it sits behind every production RAG, agent memory, and semantic-search workload. The exam writers caught up: AWS MLA-C01, Azure AI-102, GCP PMLE, and OCI Generative AI Professional all now have scenario questions on embeddings, retrieval, hybrid search, and reranking. AIF-C01 and AI-900 cover the concepts at a lighter level.
The Concepts Every Exam Tests
Dense vector representation of text/image/audio. Cosine vs dot vs Euclidean similarity. Dimension size trade-offs (1536 vs 3072 vs 384). Batch embedding cost optimization.
HNSW (graph-based, default), IVF / IVF-PQ (cluster-based, memory-efficient), DiskANN (disk-resident, large indexes). Trade-off questions: latency vs recall vs memory.
BM25 keyword + dense vector with score fusion (RRF, weighted). Most exams now have at least one hybrid-search scenario.
Cross-encoder rerankers (Cohere Rerank, Vertex Reranker, Bedrock Rerank). When to add one and how it changes latency/cost.
Fixed-size, semantic, recursive, parent-child / contextual retrieval. Exams test which chunking strategy fits which document type.
Pre-filter vs post-filter. Tenant isolation in shared indexes. PII redaction in embedded text.
Memorize HNSW. If a question asks "which ANN algorithm" and lists HNSW, IVF, and brute-force, HNSW is the answer 80% of the time on 2026 exams.
AWS Vector Stack
OpenSearch k-NN engine with FAISS, Lucene, and NMSLIB backends. HNSW or IVF. Hybrid search via OpenSearch Search Pipelines. The default vector store on Bedrock Knowledge Bases.
PostgreSQL extension. HNSW index. Best when relational data and vectors live together. Aurora Limitless for sharding.
Higher-level managed retrieval service. Less flexibility, more out-of-the-box. Appears in MLA-C01 scenarios when "fully managed" is in the requirements.
Wraps OpenSearch / Aurora / MongoDB / Pinecone. Exams test when to use Knowledge Bases vs roll-your-own retrieval.
Azure Vector Stack
HNSW vector index, hybrid search with semantic ranking, integrated vectorization (built-in chunking + embedding pipeline). Default retriever in Azure AI Foundry agents.
HNSW + IVFFlat indexes. DiskANN preview for large datasets.
HNSW + IVF. Multi-tenant SaaS pattern is a frequent AI-102 scenario.
GCP Vector Stack
Formerly Matching Engine. Tree-AH / ScaNN under the hood. Highest scale of any managed vector service. PMLE scenarios reward knowing when ScaNN beats HNSW (very large indexes).
HNSW + IVFFlat. AlloyDB AI adds google_ml_integration extension for in-database embeddings.
VECTOR_SEARCH SQL function. Great for data already living in BigQuery. Tested in PMLE data-platform scenarios.
Drill Vector / Retrieval Scenarios with AI
ExamCertAI covers every cloud AI cert with scenario questions on embeddings, hybrid search, reranking, and chunking — per-question explanations included.
Launch ExamCertAI →OCI & Third-Party
VECTOR data type, HNSW and IVF indexes, native SQL syntax. OCI Generative AI Professional now has 4-6 questions on it.
Mentioned by name in Bedrock Knowledge Bases options and on Azure AI Foundry connectors. Serverless vs pod-based architecture trade-off appears.
Mentioned in passing on the open-source models post. Less likely to be a correct answer on cloud-vendor exams but useful for portfolio projects.
Study Plan
- Week 1: Build a tiny RAG pipeline with pgvector locally. Embed 200 docs, run a similarity query, switch HNSW vs IVFFlat and feel the difference.
- Week 2: Migrate the same data to your primary cloud's managed service (OpenSearch, Azure AI Search, or Vertex Vector Search).
- Week 3: Add hybrid search and a reranker. Measure recall and latency.
- Week 4: Drill vector / retrieval scenarios with ExamCertAI. Pattern recognition on the trade-off questions is what wins exam time.
Common trap: "Use brute-force similarity for X million vectors" is almost always wrong. Brute-force is correct only for very small (sub-10K) corpora.
Frequently Asked Questions
Are vector databases on cloud AI certification exams?
Yes. AWS MLA-C01 covers OpenSearch, Aurora pgvector, and Bedrock KB. Azure AI-102 covers Azure AI Search and Cosmos DB. GCP PMLE covers Vertex Vector Search, AlloyDB, and BigQuery vector search. OCI GenAI Pro covers 23ai vector search.
Which vector database should I learn first?
For breadth, start with pgvector — it appears on every cloud. Then learn the managed retrieval service on your primary cloud.
What ANN algorithm should I memorize?
HNSW is the default on every major managed service. Know the IVF and IVF-PQ trade-offs and when DiskANN or ScaNN replaces HNSW.
How do I drill vector exam questions?
Drill scenario questions with ExamCertAI. Free, browser-based, scenario-heavy.
Master Vector / RAG Scenarios
ExamCertAI gives per-answer AI explanations on every question for every major AI cert.
Start Practicing →Master AI & Vector Search
ExamCertAI covers every AI cert with per-answer explanations — free.
