A hands-on Python script that shows how to:
- Create a vector collection in Chroma DB.
- Embed rich employee profiles with the Sentence-Transformers model
all-MiniLM-L6-v2. - Query the data three ways:
- Semantic similarity (natural-language search).
- Metadata filtering (SQL-like filters on fields such as department, experience, or location).
- Combined searches (similarity + metadata for pinpoint accuracy).
It’s a quick reference for anyone learning modern vector databases, HNSW indexes, and hybrid filtering.
| Tool | Version (tested) |
|---|---|
| Python | 3.9 or newer |
| chromadb | ^0.5 |
| sentence-transformers | ^2.7 |
Tip: GPU is not required; CPU is fine for the small demo set.
git clone https://github.com/<your-username>/<your-repo>.git
cd <your-repo>python -m venv venv
# macOS/Linux
source venv/bin/activate
# Windows
venv\Scripts\activatepip install -r requirements.txtpython similarity_employeedata.py- You will see console output similar to:
Collection created: employee_collection
Collection contents:
Number of documents: 15
=== Similarity Search Examples ===
1. Searching for Python developers:
Query: 'Python developer with web development experience'
1. Alex Rodriguez (employee_10) - Distance: 0.0483
2. John Doe (employee_1) - Distance: 0.1037
3. Matthew Garcia (employee_14) - Distance: 0.2126| Step | Code Section | What Happens |
|---|---|---|
| 1 | SentenceTransformerEmbeddingFunction |
Loads all-MiniLM-L6-v2 for 384-dimensional embeddings. |
| 2 | chromadb.Client() |
Starts an in-process Chroma server. |
| 3 | create_collection(...) |
Creates an HNSW index (space="cosine") with the custom embedder. |
| 4 | Loop over employees |
Generates natural-language blurbs per employee. |
| 5 | collection.add(...) |
Adds IDs, documents, and rich metadata rows to the collection. |
| 6 | collection.query(...) & collection.get(...) |
Performs six demo queries (semantic, filtered, hybrid). |
employeeslist → Replace with your own data source (e.g., from a CSV, JSON, or database).query_textexamples → Try custom search prompts like"DevOps manager with AWS"or"HR specialist in Boston".- Metadata schema → Extend with fields like
salary,skills(as lists),certifications,projects, etc.
- Chroma DB team for the blazing-fast vector store
- Sentence-Transformers by UKP Lab
- IBM Agentic AI Certification
- Example employees are fictitious; any resemblance to real persons is coincidental.