A modern semantic search application built with Bun.js, TypeScript, and OpenAI embeddings. This project provides both a RESTful API and CLI tools for generating embeddings and performing semantic similarity searches.
- Semantic Search: Search for content based on meaning, not just keywords
- RESTful API: HTTP endpoints for embedding generation and similarity search
- CLI Tools: Command-line interface for quick searches and embedding generation
- Environment-based Configuration: Different settings for development and production
src/
├── api/
│ └── controllers/
│ ├── embedding.controller.ts
│ ├── search.controller.ts
│ └── __tests__/
├── commands/
│ ├── embed.ts # CLI tool for generating embeddings
│ └── search.ts # CLI tool for semantic search
├── core/
│ ├── interfaces/
│ │ ├── ai.interface.ts
│ │ └── database.interface.ts
│ └── services/
│ └── embedding.service.ts
├── infrastructure/
│ ├── ai/
│ │ └── openai.client.ts
│ └── database/
│ └── supabase.client.ts
├── middleware/
│ └── security.middleware.ts
├── utils/
│ ├── colors.ts
│ └── response.utils.ts
├── app.ts
└── container.ts
-
Clone and Install:
git clone https://github.com/dantesCode/embedding-search.git cd embedding-search bun install -
Configure Environment: Copy
.env.exampleto.envand fill in your credentials:NODE_ENV=development PORT=3000 OPENAI_API_KEY=your_openai_api_key SUPABASE_URL=your_supabase_url SUPABASE_KEY=your_supabase_key ALLOWED_ORIGINS=http://localhost:3000,http://localhost:3001
-
Start the Server:
bun run start
bun run embed "your text here"Outputs the embedding vector for the provided text.
bun run search "your search query" [limit]Example:
bun run search "web development" 5Displays up to 5 most similar texts with color-coded similarity scores:
- 🟢 Green: High similarity (>80%)
- 🟡 Yellow: Medium similarity (50-80%)
- 🔴 Red: Low similarity (<50%)
POST /embed
Content-Type: application/json
{
"text": "Your text to embed"
}Returns:
{
"embedding": [/* vector of numbers */]
}POST /search
Content-Type: application/json
{
"text": "Your search query",
"limit": 5
}Returns:
{
"results": [
{
"text": "Similar text 1",
"similarity": 0.89
},
{
"text": "Similar text 2",
"similarity": 0.76
}
]
}GET /healthReturns:
{
"status": "healthy",
"timestamp": "2025-08-20T10:00:00.000Z"
}The application includes several security measures:
- Rate limiting
- Security headers (CORS, XSS protection, etc.)
- JSON content validation
- Environment-based security configurations
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details