Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
efbaa27
feat: Add PRD for backend productionisation and technical considerati…
cryptus-neoxys Dec 4, 2025
f3df426
feat: add tasks documentation for backend productionisation phases
cryptus-neoxys Dec 4, 2025
35a620b
feat: scaffold initial FastAPI project structure with core modules, c…
cryptus-neoxys Dec 5, 2025
71541e5
feat: implement structured logging and error handling, setup docker a…
cryptus-neoxys Dec 5, 2025
5b54923
docs: update copilot instructions for fastapi migration
cryptus-neoxys Dec 5, 2025
230c5cf
feat(database): implement async database foundation with alembic support
cryptus-neoxys Dec 5, 2025
9ef8b0c
chore: update agentic rules for python, fastapi development
cryptus-neoxys Dec 5, 2025
9b19422
feat(build): add development environment docker setup
cryptus-neoxys Dec 5, 2025
257bc93
feat(structure): move python app to backend/ directory
cryptus-neoxys Dec 5, 2025
e5e3ee8
build: add Makefile for development and production workflows
cryptus-neoxys Dec 6, 2025
f0025a8
fix(build): update context paths for docker-compose files
cryptus-neoxys Dec 6, 2025
dc674d4
docs(audio-books-library): add design doc for enhanced book search flow
cryptus-neoxys Dec 6, 2025
6b20b39
refactor(docs): move audio books library to dedicated module
cryptus-neoxys Dec 6, 2025
77e7063
feat(models): implement audio books library models and UUIDv7 support
cryptus-neoxys Dec 6, 2025
efcc885
refactor(cache): remove APICache model and simplify to Redis-only cac…
cryptus-neoxys Dec 6, 2025
bdca4a6
feat(makefile): add infra-only start commands for dev and prod
cryptus-neoxys Dec 6, 2025
6b144fa
feat(repositories): implement audio books library repositories
cryptus-neoxys Dec 6, 2025
42d218e
feat(openlibrary): implement openlibrary service with caching and can…
cryptus-neoxys Dec 7, 2025
dacc9d7
docs: add multi-provider integration patterns research document
cryptus-neoxys Dec 7, 2025
4824f61
feat(tests): add unit and integration tests for cache and openlibrary…
cryptus-neoxys Dec 7, 2025
22440b8
feat(api): implement book search endpoints with OpenLibrary integration
cryptus-neoxys Dec 7, 2025
b2a136c
feat(library): implement LibraryService with work and edition operations
cryptus-neoxys Dec 7, 2025
8baa5ef
docs: update backend readme with new architecture and setup details
cryptus-neoxys Dec 7, 2025
84977b1
docs: Update cache service description and refactor FastAPI/Python ru…
cryptus-neoxys Dec 11, 2025
374be4c
docs: Add design document for audio books pipeline.
cryptus-neoxys Dec 11, 2025
090cce6
refactor(audio-books): move pipeline tasks to dedicated module and im…
cryptus-neoxys Dec 11, 2025
6041373
docs(audio-books-library): update task documentation to reflect moved…
cryptus-neoxys Dec 11, 2025
2ecddb1
feat(processing): add processing api endpoints with schemas and tests
cryptus-neoxys Dec 11, 2025
69e90fe
feat(models): add job and audio book job models for background proces…
cryptus-neoxys Dec 11, 2025
dd726fd
feat(database): add jobs and audio_book_jobs tables migration
cryptus-neoxys Dec 11, 2025
e162290
feat(repositories): add job and audiobook job repositories
cryptus-neoxys Dec 11, 2025
c463422
chore(deps): add instructor dependency for structured LLM output
cryptus-neoxys Dec 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .agent/rules/pythonic-fastapi-async.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
trigger: glob
description: Rules for working with Python and FastAPI backend/API development.
globs: *.py
---

You are an expert in Python, FastAPI, and scalable API development.

Tech Stack

- FastAPI
- Pydantic v2
- Async database libraries
- SQLAlchemy 2.0 (if using ORM features)

Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix SQLAlchemy typo.

Line 36 contains a spelling error: "SQLAlachemy" should be "SQLAlchemy".

Apply this correction:

- Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
+ Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices.
Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
🧰 Tools
🪛 LanguageTool

[grammar] ~36-~36: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In .agent/rules/pythonic-fastapi-async.md around line 36, fix the typo
"SQLAlachemy" to the correct spelling "SQLAlchemy" so the sentence reads
referencing FastAPI, Pydantic, SQLAlchemy, other library documentation for Data
Models, Schemas, Path Operations, Middleware and best practices.


## General Coding Principles

- Write concise, technical responses with accurate Python examples.
- Prefer iteration and modularization over code duplication.
- Favour composition over inheritence. Smart core, Thin interfaces.

## Naming and File Structure

- Prefer writing meaningful, descriptive variable names with auxiliary verbs (e.g., is_active, has_permission), over (decorated) comments, follow pep8 style documenting comments.
- Use lowercase with underscores for directories and files (e.g., routers/user_routes.py).
- Favor named exports for routes and utility functions.

## Asynchronous Programming

- Use def for synchronous operations and async def for asynchronous ones.

## Type Safety and Validation

- Use type hints for all function signatures. Prefer Pydantic models over raw dictionaries for input validation.
- Write typed python code strictly, and avoid the use of `Any`
- Use functional components (plain functions) and Pydantic models/basemodel for consistent input/output validation and response schemas.
- Ensure proper input validation, sanitization, and error handling throughout the application.

## Error Handling

- Prioritize error handling and edge cases
- Use HTTPException for expected errors and model them as specific HTTP responses.

## FastAPI Best Practices

- Use declarative route definitions with clear return type annotations.
- Minimize @app.on_event("startup") and @app.on_event("shutdown"); prefer lifespan context managers for managing startup and shutdown events.
- Use middleware for logging, error monitoring, and performance optimization and for handling unexpected errors.

## Performance Optimization

- Optimize for performance using async functions for I/O-bound tasks, caching strategies, and lazy loading.
- Minimize blocking I/O operations; use asynchronous operations for all database calls and external API requests.
- Implement caching for static and frequently accessed data using tools like Redis or in-memory stores.
- Optimize data serialization and deserialization with Pydantic.
- Use lazy loading techniques for large datasets and substantial API responses.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,6 @@ test_*.py
docs/
*.md
!README.md
!Readme.md
knowledge/
samples/
20 changes: 0 additions & 20 deletions .env.example

This file was deleted.

122 changes: 77 additions & 45 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,79 @@
# BookBytes AI Coding Instructions

## 🧠 Project Overview

BookBytes converts physical books (via ISBN) into chapter-wise audio summaries.

- **Core Logic**: `BookBytesApp` class in `app.py` orchestrates the entire pipeline.
- **Stack**: Python 3.8+, Flask, SQLite, OpenAI GPT-3.5, gTTS.
- **Data Flow**: ISBN -> Open Library API (Metadata) -> OpenAI (Chapter extraction & Summaries) -> gTTS (Audio) -> SQLite (Persistence).

## 🏗 Architecture & Patterns

- **Service Layer**: `BookBytesApp` encapsulates all business logic. Do not put logic in Flask routes or CLI commands; they should only call `BookBytesApp` methods.
- **Data Models**: Use `@dataclass` for entities (`Book`, `Chapter`) defined in `app.py`.
- **Database**: SQLite with raw SQL queries in `BookBytesApp`. Tables: `books`, `chapters`.
- **Logging**: MUST use `logger.py`. Import with `from logger import get_logger`.
```python
logger = get_logger(__name__)
logger.info("Message", extra={"context": "value"})
```
- **Path Handling**: Always use `pathlib.Path` instead of `os.path`.

## 🛠 Workflows & Commands

- **Run API**: `python app.py` (Starts Flask server on port 5000).
- **Run CLI**: `python cli.py [command]` (e.g., `process --isbn <isbn>`).
- **Docker**: `docker-compose up -d` (Runs app + persists data in `bookbytes-data` volume).
- **Testing**:
- `test_app.py` is a standalone integration test script, NOT a pytest suite.
- Run against a running server: `python test_app.py`.
- Ensure `OPENAI_API_KEY` is set in `.env` before running.

## 📦 Dependencies & Integrations

- **External APIs**:
- Open Library (Book metadata).
- OpenAI API (Summarization, Chapter detection).
- **Audio**: `gTTS` (Google Text-to-Speech) saves files to `audio/` directory.
- **Environment**: Load vars using `python-dotenv` (handled in `cli.py` and `app.py`).

## 🚨 Critical Conventions

- **Error Handling**: Catch exceptions in `BookBytesApp` methods and return a result dict (`{'success': False, 'message': ...}`) rather than raising exceptions to the caller.
- **File Structure**:
- `app.py`: Monolithic core (API + Logic + Models).
- `cli.py`: CLI wrapper around `BookBytesApp`.
- `knowledge/`: Documentation storage.
## Project Overview

BookBytes converts physical books (via ISBN) into chapter-wise audio summaries. Currently being refactored from Flask monolith to production FastAPI.

- **Stack**: Python 3.13+, FastAPI, PostgreSQL (async), Redis, ARQ workers, OpenAI, gTTS
- **Data Flow**: ISBN → Open Library API → OpenAI (chapters + summaries) → gTTS → Storage (local/S3)

## Architecture (`src/bookbytes/`)

```
main.py # FastAPI app factory with lifespan, middleware, exception handlers
config.py # Pydantic Settings with env validation (Settings class, enums)
dependencies.py # FastAPI Depends() injection container
core/
exceptions.py # Exception hierarchy (BookBytesError base, domain-specific subclasses)
logging.py # Structlog config with correlation ID support
api/v1/ # Versioned API routers
services/ # Business logic (call from routes, not vice versa)
repositories/ # Database access layer (SQLAlchemy async)
schemas/ # Pydantic request/response models (BaseSchema in common.py)
models/ # SQLAlchemy ORM models
storage/ # Pluggable storage (local dev, S3 prod)
workers/ # ARQ background job handlers
```

## Logging (MUST use structlog)

```python
from bookbytes.core.logging import get_logger
logger = get_logger(__name__)
logger.info("Processing book", isbn="123", user_id="abc") # Key-value pairs, not f-strings
```

Correlation IDs are auto-injected via middleware. Use `set_correlation_id()` for background jobs.

## Exceptions Pattern

Raise domain exceptions from `core/exceptions.py`, never raw `Exception`. Global handlers convert to JSON:

```python
from bookbytes.core.exceptions import BookNotFoundError
raise BookNotFoundError(isbn="123") # Returns {"error": {"code": "BOOK_NOT_FOUND", ...}}
```

## Configuration

All config via `Settings` class in `config.py`. Access with `get_settings()` (cached) or `SettingsDep` in routes:

```python
from bookbytes.config import get_settings
settings = get_settings()
if settings.is_development: ...
```

## Commands

- **Run API**: `uv run python -m bookbytes.main` or `uv run uvicorn bookbytes.main:app --reload`
- **Tests**: `uv run pytest tests/` (async fixtures in `tests/conftest.py`)
- **Lint**: `uv run ruff check src/ tests/` | **Format**: `uv run ruff format src/ tests/`
- **Type check**: `uv run mypy src/`

## Testing Conventions

- Use fixtures from `tests/conftest.py`: `async_client`, `authenticated_client`, `test_settings`
- Mock external services (OpenAI, Open Library) using `tests/mocks/`

## Key Conventions

- **Async everywhere**: All DB/HTTP ops must use `async/await`
- **Pydantic schemas**: Inherit from `BaseSchema` in `schemas/common.py` (auto ORM conversion)
- **Enums for options**: Use `str, Enum` pattern (e.g., `Environment`, `StorageBackend`) for type-safe configs
- **Path handling**: Use `pathlib.Path`, never `os.path`
- **Auth modes**: `API_KEY` for dev (header `X-API-Key`), `JWT` for prod

## Legacy Code (root-level)

`app.py`, `cli.py`, `test_app.py` are the original Flask implementation—reference for business logic only.
144 changes: 144 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
.PHONY: help dev prod build-dev build-prod down-dev down-prod \
start-dev-infra start-prod-infra \
logs-dev logs-prod api-logs-dev api-logs-prod \
db-logs-dev db-logs-prod redis-logs-dev redis-logs-prod \
migrate shell test clean-docker-dev status

# Variables
BACKEND_DIR := backend
COMPOSE_DEV := $(BACKEND_DIR)/docker/docker-compose.dev.yml
COMPOSE_PROD := $(BACKEND_DIR)/docker/docker-compose.yml
DC_DEV := docker compose -f $(COMPOSE_DEV)
DC_PROD := docker compose -f $(COMPOSE_PROD)

help:
@echo "BookBytes - Development Commands"
@echo "================================="
@echo ""
@echo "🚀 Environment:"
@echo " dev - Start dev (hot reload)"
@echo " prod - Start production"
@echo " build-dev - Build dev containers"
@echo " build-prod - Build prod containers"
@echo " down-dev - Stop dev services"
@echo " down-prod - Stop prod services"
@echo " start-dev-infra - Start infra only (postgres, redis)"
@echo " start-prod-infra - Start prod infra only"
@echo ""
@echo "Logs (composable env-component):"
@echo " logs-dev - All dev logs"
@echo " logs-prod - All prod logs"
@echo " api-logs-dev - Dev API logs"
@echo " api-logs-prod - Prod API logs"
@echo " db-logs-dev - Dev DB logs"
@echo " db-logs-prod - Prod DB logs"
@echo " redis-logs-dev - Dev Redis logs"
@echo " redis-logs-prod - Prod Redis logs"
@echo ""
@echo "Database (dev-container only):"
@echo " migrate - Run migrations (dev)"
@echo " shell - Python shell (dev)"
@echo " db-shell - PostgreSQL shell (dev)"
@echo ""
@echo "🧪 Testing (dev-container only):"
@echo " test - Run tests"
@echo " test-cov - Tests with coverage"
@echo ""
@echo "🧹 Cleanup (dev-container only):"
@echo " clean-docker-dev - Remove dev containers/images/volumes"
@echo " status - Show container status"

# Start
dev:
$(DC_DEV) up -d

prod:
$(DC_PROD) up -d

# Build
build-dev:
$(DC_DEV) build

build-prod:
$(DC_PROD) build

# Stop
down-dev:
$(DC_DEV) down

down-prod:
$(DC_PROD) down

# Infrastructure only (no API - for migrations, local dev)
start-dev-infra:
$(DC_DEV) up -d postgres redis

start-prod-infra:
$(DC_PROD) up -d postgres redis

# Logs (composable)
logs-dev:
$(DC_DEV) logs -f

logs-prod:
$(DC_PROD) logs -f

api-logs-dev:
$(DC_DEV) logs -f api

api-logs-prod:
$(DC_PROD) logs -f api

db-logs-dev:
$(DC_DEV) logs -f postgres

db-logs-prod:
$(DC_PROD) logs -f postgres

redis-logs-dev:
$(DC_DEV) logs -f redis

redis-logs-prod:
$(DC_PROD) logs -f redis

# Database
migrate:
$(DC_DEV) exec api uv run alembic upgrade head

shell:
$(DC_DEV) exec api uv run python

db-shell:
$(DC_DEV) exec postgres psql -U bookbytes -d bookbytes

# Test
test:
cd $(BACKEND_DIR) && uv sync --all-extras && uv run pytest

test-cov:
cd $(BACKEND_DIR) && uv sync --all-extras && uv run pytest --cov=src/bookbytes --cov-report=html

# Status
status:
@echo "Dev containers:"
@$(DC_DEV) ps
@echo ""
@echo "Prod containers:"
@$(DC_PROD) ps

# Cleanup (DEV ONLY - composable)
clean-dev-containers:
@echo "🧹 Removing dev containers..."
$(DC_DEV) rm -f

clean-dev-images:
@echo "🧹 Removing dev images..."
@docker images | grep bookbytes-.*-dev | awk '{print $$3}' | xargs -r docker rmi 2>/dev/null || echo "No dev images to remove"

clean-dev-volumes:
@echo "⚠️ Removing dev volumes..."
@read -p "Remove dev volumes? [y/N] " confirm && [ "$$confirm" = "y" ] && \
docker volume ls | grep bookbytes_dev | awk '{print $$2}' | xargs -r docker volume rm 2>/dev/null || echo "Skipped volume removal"

clean-dev: down-dev clean-dev-containers clean-dev-images clean-dev-volumes
@echo "✅ Dev cleanup complete"
Loading