Skip to content

Conversation

@cryptus-neoxys
Copy link
Owner

@cryptus-neoxys cryptus-neoxys commented Dec 2, 2025

Summary by CodeRabbit

  • New Features

    • FastAPI app with search, work/ISBN lookup, health (liveness/readiness) endpoints and processing API surface (processing endpoints return 501 Not Implemented).
    • App exposes service info and echoes X-Request-ID for requests.
  • Performance

    • Redis caching with stale‑while‑revalidate, TTL jitter and deterministic keys to speed searches and work lookups.
  • Data & Background

    • Persistent domain for works, editions, providers, audiobooks, chapters and job tracking to enable future processing pipelines.
  • Infrastructure & DevOps

    • Docker, compose presets, Makefile and DB migration tooling for local/dev/prod workflows.
  • Testing

    • Expanded unit and integration test suites and fixtures.
  • Documentation

    • New architecture, productionisation and FastAPI/Python style guides.

✏️ Tip: You can customize this high-level summary in your review settings.

@cryptus-neoxys cryptus-neoxys force-pushed the feat/productionise-and-saasify branch from 70ae31a to a5ac636 Compare December 4, 2025 14:56
@cryptus-neoxys cryptus-neoxys reopened this Dec 4, 2025
…ons, including architecture, goals, user stories, and database schema
@cryptus-neoxys cryptus-neoxys force-pushed the feat/productionise-and-saasify branch from 8942da9 to efbaa27 Compare December 4, 2025 15:12
@cryptus-neoxys cryptus-neoxys changed the title feat(app): poductionise and initiate saasification feat(app): productionise and initiate saasification Dec 4, 2025
…nd testing

- Add structured logging with correlation ID support
- Implement custom exception hierarchy for consistent error responses
- Configure request ID middleware and logging middleware
- Update health check endpoints with request ID support
- Add Docker and docker-compose configuration for production deployment
- Set up test fixtures and mocks for integration testing
Update documentation to reflect architectural changes from Flask to FastAPI.
Include new project structure, logging conventions, and testing guidelines.
add base models, repositories and async database setup
initialize alembic for async migrations
update health check to verify database connectivity
add integration tests for database operations
- Add Dockerfile.dev for development with hot reload support
- Create docker-compose.dev.yml with postgres and redis services
- Update main Dockerfile to include Readme.md and use shell form for CMD
- Remove Readme.md to .dockerignore exceptions
update docker compose contexts, sources
delete old .env.example
The Makefile provides convenient commands for building, running, and managing the BookBytes application in both development and production environments. It includes targets for container management, logging, database operations, testing, and cleanup.
Update build context paths from '.' to '..' to correctly reference parent directory. Also adjust source volume mounts to use '../src' instead of './src' to maintain consistency with the new context paths.
Add comprehensive design document outlining architecture decisions, data models, caching strategy, and implementation phases for the enhanced book search flow in the audio books library. The document covers provider-agnostic storage, two-tier caching, and future multi-API support.
@cryptus-neoxys cryptus-neoxys force-pushed the feat/productionise-and-saasify branch from 37e3ec2 to dc674d4 Compare December 6, 2025 11:55
Restructure the backend productionisation tasks by extracting the audio books library functionality into a separate module. This includes:
- New dedicated task file with detailed implementation steps
- Updated models, repositories, services, and API endpoints
- Comprehensive testing plan
- Improved caching strategy with two-tier approach
add uuid6 dependency and pg_idkit extension for UUIDv7 support
implement all audio books library models (Work, Edition, AudioBook, Chapter, BookProvider, APICache)
update base model to use UUIDv7 primary keys
Add pg_idkit extension for PostgreSQL UUIDv7 generation.
export new models in __init__.py
…hing

Update documentation and models to reflect removal of PostgreSQL cache layer.
Simplify language code length in Edition model from 7 to 3 characters.
Add start-dev-infra and start-prod-infra targets to run only database and redis services
Add repositories for audio books library entities including Work, Edition, Chapter, AudioBook and BookProvider with CRUD operations and specialized queries. Remove pg_idkit extension migration as UUIDv7 is now generated in Python.

feat(cache): implement Redis-based CacheService

Add CacheService with TTL-based caching, stale-while-revalidate support and pattern invalidation. Uses Redis with AOF persistence for durability.

docs: update task checklist with completed items

Mark all audio books library tasks as completed including model creation, repository implementation and cache service setup.
@cryptus-neoxys
Copy link
Owner Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 6, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Dec 6, 2025

Walkthrough

Adds a production-ready FastAPI backend: async SQLAlchemy models, repositories and Alembic migrations; Redis-backed CacheService and OpenLibrary integration; structured logging and Pydantic Settings; Docker/dev tooling and Makefile; API v1 search and processing endpoints with schemas; plus unit and integration tests and fixtures.

Changes

Cohort / File(s) Summary
Docs & Planning
/.agent/rules/pythonic-fastapi-async.md, /.github/copilot-instructions.md, tasks/knowledge/multi-provider-integration-patterns.md, tasks/backend-productionisation/...
New architecture guides, PRDs, design docs, task lists and multi‑provider knowledge for backend productionisation.
Env & Build
/.dockerignore, /.env.example, /env.example, Makefile, backend/pyproject.toml
Update dockerignore, clear .env.example, add top-level env template, add Makefile targets and backend pyproject metadata.
Docker / Compose
backend/docker/Dockerfile, backend/docker/Dockerfile.dev, backend/docker/docker-compose.dev.yml, backend/docker/docker-compose.yml
Add production & dev Dockerfiles and compose configs (api, postgres, redis, optional worker).
Alembic / Migrations & Templates
backend/alembic/*, backend/alembic/versions/*
Add Alembic config, async env runner, script template/README and migrations creating works/editions/audio_books/book_providers/chapters and jobs/audio_book_jobs.
Project README
backend/Readme.md
Add comprehensive backend README with setup, architecture, commands and dev notes.
App & Lifespan
backend/src/bookbytes/main.py, backend/src/bookbytes/__init__.py
FastAPI app factory with lifespan for logging/DB init/cleanup, middleware (CORS, request/correlation IDs), exception handlers, health/root routes and CLI.
Configuration
backend/src/bookbytes/config.py
Pydantic Settings with enums, validators, derived properties and cached get_settings().
Core infra
backend/src/bookbytes/core/database.py, backend/src/bookbytes/core/logging.py, backend/src/bookbytes/core/exceptions.py
Async DB engine/session lifecycle and health check; structlog configuration with correlation IDs; domain exception hierarchy with structured to_dict().
Dependencies / DI
backend/src/bookbytes/dependencies.py
FastAPI dependency helpers: settings, DB session with commit/rollback semantics, and stubs for Redis, storage, services and auth.
Models & Mixins
backend/src/bookbytes/models/*
Declarative Base and mixins (UUID/Timestamp/SoftDelete) and domain models: Work, Edition, BookProvider, AudioBook, Chapter, Job, AudioBookJob; package exports updated.
Repositories
backend/src/bookbytes/repositories/*
Generic async BaseRepository and SoftDeleteRepository plus domain repositories (Work, Edition, BookProvider, AudioBook, Chapter, Job, AudioBookJob) and consolidated exports.
Services: Cache, OpenLibrary, Library
backend/src/bookbytes/services/cache.py, backend/src/bookbytes/services/openlibrary.py, backend/src/bookbytes/services/library.py
Redis-backed CacheService (TTL, jitter, stale‑while‑revalidate), OpenLibraryService (HTTP, DTOs, caching, DI helpers), and LibraryService for persistence orchestration.
API v1 & Schemas
backend/src/bookbytes/api/v1/router.py, backend/src/bookbytes/api/v1/search.py, backend/src/bookbytes/api/v1/processing.py, backend/src/bookbytes/schemas/*
v1 router, search and processing endpoints (/search, /works/{work_key}, /isbn/{isbn}, processing placeholders) with Pydantic schemas for search, works, editions, audiobooks, processing and errors.
Storage & Workers namespaces
backend/src/bookbytes/storage/__init__.py, backend/src/bookbytes/workers/__init__.py
Package initializers for storage and workers (placeholders).
Tests & Mocks
backend/tests/*, backend/tests/mocks/*
Pytest fixtures (settings, app, async client), unit and integration tests for health, DB lifecycle, cache, OpenLibraryService, LibraryService, repositories and API endpoints; mock OpenAI/OpenLibrary responses and integration suites.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as FastAPI
    participant Cache as CacheService (Redis)
    participant OL as OpenLibraryService
    participant DB as Postgres / Repos

    Client->>API: POST /api/v1/books/search
    API->>Cache: get(search_key)
    alt cache hit & fresh
        Cache-->>API: data, needs_revalidation=false
        API-->>Client: 200 OK (cached data)
    else cache hit & stale
        Cache-->>API: data, needs_revalidation=true
        API-->>Client: 200 OK (stale data)
        API->>OL: async revalidate -> search_books(...)
        OL->>Cache: set(search_key, new_data)
    else cache miss
        API->>OL: search_books(...)
        OL->>OL: fetch & parse OpenLibrary
        OL->>Cache: set(search_key, parsed_data)
        OL-->>API: parsed_data
        API-->>Client: 200 OK (fresh data)
    end
    Note over OL,DB: persistence path
    OL->>DB: create/get Work & Edition via repositories
    DB-->>OL: persisted entities
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Pay extra attention to:
    • Async lifecycle and Alembic async env: backend/src/bookbytes/core/database.py, backend/alembic/env.py, backend/src/bookbytes/main.py
    • Structured logging and ContextVar propagation: backend/src/bookbytes/core/logging.py
    • Cache semantics, key determinism, TTL jitter and error handling: backend/src/bookbytes/services/cache.py
    • OpenLibrary HTTP/error/caching flow and DTO mapping: backend/src/bookbytes/services/openlibrary.py
    • Repositories, SQLAlchemy relationships, cascades and soft-delete filtering vs migrations: backend/src/bookbytes/repositories/*, backend/src/bookbytes/models/*, backend/alembic/versions/*

Poem

🐰 I hopped into code with a tiny clap,
FastAPI tunnels and a Redis nap,
Migrations and models, schemas that sing,
Tests and Docker—how the project springs!
Off I bound with a carrot and a ping.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(app): productionise and initiate saasification' directly summarizes the main objective of the changeset—transitioning the application to a production-ready, SaaS-oriented architecture.
Docstring Coverage ✅ Passed Docstring coverage is 96.59% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/productionise-and-saasify

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (32)
backend/src/bookbytes/__init__.py (1)

1-3: Clean package metadata; optionally single‑source the version.

Docstring and __version__ are clear and conventional; this is good as a minimal package initializer.

If you expect the version to also live in pyproject.toml/setup.cfg, consider single‑sourcing it (e.g., reading from packaging metadata or centralizing the value in one place) to avoid future drift. Not urgent for now.

.agent/rules/pythonic-fastapi-async.md (1)

8-36: Consider organizing guidance into sections for better scanability.

The guidance content is comprehensive and substantively sound, but it's presented as a flat list of bullet points. Adding section headers would make it easier to navigate and reference related guidance.

For example:

 You are an expert in Python, FastAPI, and scalable API development.

+## Recommended Stack
+
 FastAPI
 Pydantic v2
 Async database libraries
 SQLAlchemy 2.0 (if using ORM features)

+## Coding Conventions
+
 Write concise, technical responses with accurate Python examples.
 Prefer iteration and modularization over code duplication.
 Favour composition over inheritance. Smart core, Thin interfaces.
 Use descriptive variable names with auxiliary verbs (e.g., is_active, has_permission).
 Use lowercase with underscores for directories and files (e.g., routers/user_routes.py).
 Favor named exports for routes and utility functions.
 Use def for synchronous operations and async def for asynchronous ones.

+## Type Hints and Validation
+
 Use type hints for all function signatures. Prefer Pydantic models over raw dictionaries for input validation.
 Write typed python code strictly, and avoid the use of `Any`

+## Error Handling
+
 Prioritize error handling and edge cases
 [... rest organized similarly]
backend/docker/Dockerfile (1)

72-73: Shell form CMD with env var interpolation may fail in slim base image.

The CMD uses shell form to interpolate ${PORT}, which requires /bin/sh to be present. Since the base image is python:3.13-slim, this typically works, but it's not guaranteed across all variants. For robustness, interpolate the PORT variable at image build time instead:

-ENV PATH="/app/.venv/bin:$PATH" \
+ENV PATH="/app/.venv/bin:$PATH" \
     PYTHONUNBUFFERED=1 \
     PYTHONDONTWRITEBYTECODE=1 \
     APP_ENV=production \
     HOST=0.0.0.0 \
-    PORT=8000
+    PORT=8000 \
+    UVICORN_CMD="uvicorn bookbytes.main:app --host 0.0.0.0 --port 8000"

-CMD uvicorn bookbytes.main:app --host 0.0.0.0 --port ${PORT}
+CMD ${UVICORN_CMD}

Alternatively, use exec form with hardcoded port if the PORT environment variable is not meant to be overridden at runtime.

Verify that the bookbytes.main:app FastAPI application provides a /health/live endpoint as expected by the HEALTHCHECK directive (line 69).

tasks/backend-productionisation/audio-books-library/design-doc.md (2)

26-26: Add language specifiers to fenced code blocks per markdownlint (MD040).

Fenced code blocks should specify a language for proper syntax highlighting and linting. Update:

-```
+```json
 {
   "numFound": 629,
   ...
 }
-```
+```

Similarly for:

  • Line 76: Use ````mermaid` (if diagram) or ``` for ASCII art
  • Line 360: Use ``` (plain text/example)
  • Line 397: Use ``` (plain text/diagram)

Also applies to: 76-76, 360-360, 397-397


332-332: Convert emphasis-based headings to proper markdown headings (MD036).

Use heading syntax instead of bold emphasis for section titles:

-**Example Usage:**
+### Example Usage

And:

-**Key Points:**
+### Key Points

Also applies to: 584-584

.github/copilot-instructions.md (1)

12-26: Add language identifier to fenced code block.

Per markdownlint MD040, fenced code blocks should have a language specified. For directory structures, use text or plaintext:

-```
+```text
 main.py          # FastAPI app factory with lifespan, middleware, exception handlers
backend/src/bookbytes/models/base.py (1)

103-107: Remove redundant import inside method.

datetime is already imported at the module level (line 17). The local import is unnecessary:

 def mark_deleted(self) -> None:
     """Mark this entity as soft-deleted."""
-    from datetime import UTC, datetime
-
-    self.deleted_at = datetime.now(UTC)
+    from datetime import UTC
+    self.deleted_at = datetime.now(UTC)

Or better, import UTC at the module level alongside datetime:

-from datetime import datetime
+from datetime import UTC, datetime

Then simplify the method:

 def mark_deleted(self) -> None:
     """Mark this entity as soft-deleted."""
-    from datetime import UTC, datetime
-
     self.deleted_at = datetime.now(UTC)
backend/tests/mocks/openlibrary_responses.py (1)

102-104: Consider adding documentation for empty response behavior.

The empty dicts for NOT_FOUND_RESPONSE and INVALID_ISBN_RESPONSE appear intentional (Open Library API returns {} for unknown ISBNs), but a brief comment would clarify this for future maintainers:

+# Open Library returns empty dict for ISBNs not in their database
 NOT_FOUND_RESPONSE: dict[str, Any] = {}

+# Invalid ISBN format also returns empty dict from the API
 INVALID_ISBN_RESPONSE: dict[str, Any] = {}
backend/src/bookbytes/repositories/base.py (2)

39-40: Remove unused TypeVar.

S is defined but never used in this module:

 # Type variable for model classes
 T = TypeVar("T", bound=Base)
-
-# Type variable for soft-deletable models
-S = TypeVar("S", bound=SoftDeleteMixin)

209-218: Type parameter doesn't enforce SoftDeleteMixin requirement.

The docstring correctly notes models "must include SoftDeleteMixin," but the type system doesn't enforce this. The type: ignore comments throughout acknowledge this gap.

For stricter typing (optional, Python typing limitations make this verbose):

from typing import Protocol

class SoftDeletable(Protocol):
    deleted_at: datetime | None

# Then use intersection or runtime checks

The current approach with documentation is pragmatic and common in Python. The type: ignore comments are appropriate.

env.example (1)

11-120: Clarify that these defaults are for local dev only (consider separate .env.dev).

Values like DEBUG=true, AUTH_MODE=api_key, JWT_SECRET_KEY=dev-secret-key-change-in-production, and the sample API_KEY are great for local setup but risky if copied verbatim into production. Consider either:

  • Explicitly marking this file as development-only at the top, or
  • Providing a second template (e.g. .env.prod.example) with safer defaults (DEBUG=false, AUTH_MODE=jwt, blank secrets).
backend/docker/docker-compose.dev.yml (1)

12-25: Avoid hardcoding credentials in compose; prefer env files or variables even for dev.

DATABASE_URL (with user/password) and defaults for API_KEY, JWT_SECRET_KEY, and OPENAI_API_KEY are currently embedded in the compose file. For a cleaner security posture and to keep static analysis tools happy, consider:

  • Moving these into a .env/backend.env file referenced via env_file, or
  • Using placeholder vars here and letting developers define real values in their local .env.

This keeps the compose file non-sensitive and closer to how production will be configured.

backend/src/bookbytes/models/edition.py (1)

23-77: Edition model wiring looks solid; consider tightening a couple of invariants.

The core mapping (PK mixins, work_id FK, isbn uniqueness, relationships to Work, AudioBook, and BookProvider) is clean and consistent with the other models.

Two optional refinements you might consider:

  • Constrain isbn_type values: given the docstring promises "isbn10" or "isbn13", a CheckConstraint or SQLAlchemy Enum would prevent accidental invalid values.
  • One-to-one audio_book semantics: you already use uselist=False and delete-orphan; if you want to strictly enforce single-parent semantics at the ORM level, you can also set single_parent=True on the relationship (worth verifying against your SQLAlchemy version and desired behaviour).
backend/docker/Dockerfile.dev (1)

8-55: Dev image is sound; consider pinning uv and reevaluating --no-dev for DX.

This Dockerfile is well-structured (non-root user, isolated venv, curl for healthcheck). Two optional tweaks:

  • Pin the uv image tag instead of :latest to avoid surprise breakages when uv updates.
  • Double‑check --no-dev: for a development image you may actually want dev/test tooling (linters, pytest, etc.) inside the container. If your workflow runs those on the host only, this is fine; otherwise, dropping --no-dev can simplify containerized dev.
backend/src/bookbytes/models/book_provider.py (1)

89-94: Index on entity_id may be less useful without FK enforcement.

The index ix_book_providers_entity_lookup on (entity_type, entity_id) is meant for efficient lookups, but since entity_id is not a foreign key, queries will likely use work_id or edition_id directly via relationships. Consider indexing work_id and edition_id instead if entity_id is removed.

backend/tests/integration/test_database.py (1)

19-52: Test doesn't verify session cleanup on exceptions.

The test covers the happy path but doesn't verify that the session properly rolls back on exceptions, which is part of the get_async_session contract (as shown in the relevant snippet from database.py lines 65-80).

Consider adding a test case for exception handling:

@pytest.mark.asyncio
async def test_database_session_rollback_on_error() -> None:
    """Test that session rolls back on exception."""
    # ... setup similar to test_database_session_lifecycle ...
    
    with pytest.raises(ValueError):
        async for session in get_async_session():
            await session.execute(text("SELECT 1"))
            raise ValueError("Simulated error")
    
    # Session should have rolled back automatically
    await close_db()
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1)

62-68: Specify language for the fenced code block.

The folder structure code block at line 62 should have a language specifier for proper rendering and syntax highlighting.

-```
+```text
 bookbytes/
 ├── alembic/
tasks/backend-productionisation/technical-considerations-backend-productionisation.md (1)

62-62: Specify language for the fenced code block.

The folder structure block should have a language identifier for proper rendering.

-```
+```text
 bookbytes/
backend/tests/conftest.py (1)

74-76: Unused import.

asynccontextmanager is imported but never used in this fixture.

-    from contextlib import asynccontextmanager
-
     from bookbytes.core.database import close_db, init_db
backend/src/bookbytes/core/logging.py (1)

94-114: Redundant processor in JSON formatter configuration.

The processors list includes JSONRenderer() at line 99, and the ProcessorFormatter at line 102 creates another JSONRenderer() instance. The processors list is passed to structlog.configure() at line 118, which is correct, but creating two separate renderer instances is slightly wasteful.

Consider reusing the same renderer instance:

     if settings.use_json_logs:
         # Production: JSON format for log aggregation
+        json_renderer = structlog.processors.JSONRenderer()
         processors: list[Processor] = [
             *shared_processors,
             structlog.processors.format_exc_info,
-            structlog.processors.JSONRenderer(),
+            json_renderer,
         ]
         formatter = structlog.stdlib.ProcessorFormatter(
-            processor=structlog.processors.JSONRenderer(),
+            processor=json_renderer,
             foreign_pre_chain=shared_processors,
         )
backend/src/bookbytes/main.py (2)

29-31: Logger initialized before configure_logging() is called.

The module-level logger = get_logger(__name__) executes at import time, before configure_logging() runs during app lifespan. This logger instance will use structlog's default configuration, potentially causing inconsistent log formatting for early errors.

Consider deferring logger acquisition or accepting this as a known limitation for module-level logging.


115-117: Redundant conditional expressions.

The ternary expressions evaluate to the same value in both branches, making the conditionals pointless. Either remove them or implement the intended production behavior (e.g., disabling docs in production).

-        docs_url="/docs" if settings.is_development else "/docs",
-        redoc_url="/redoc" if settings.is_development else "/redoc",
-        openapi_url="/openapi.json" if settings.is_development else "/openapi.json",
+        docs_url="/docs" if settings.is_development else None,
+        redoc_url="/redoc" if settings.is_development else None,
+        openapi_url="/openapi.json" if settings.is_development else None,
backend/src/bookbytes/core/database.py (2)

149-162: Consider more resilient behavior and richer logging in health checks

check_db_connection is fine for a happy-path health check, but two small tweaks could improve operability:

  • If the DB is not initialized, get_engine() will raise RuntimeError. For health endpoints, it can be nicer to map that to False with a clear log message rather than surfacing an unexpected 500 upstream.
  • When logging failures, you may want to capture the stack trace for easier debugging, e.g. by adding exc_info=True or logging the exception object directly for your structlog configuration.

Example:

except Exception as e:
    logger.error("Database health check failed", error=str(e))
    return False

is fine, but consider including structured fields like exc=str(e) and any relevant context (e.g. DSN alias) if your logging pipeline supports it.

Please confirm how your health endpoint wraps this function and whether surfacing an uninitialized engine as an exception vs. a simple False aligns with your desired health semantics.


165-183: Strengthen URL password masking using proper URL parsing

The current _mask_password implementation is simple and works for typical user:password@host URLs, but it can be tripped up by edge cases (e.g., @ inside the password, different schemes, or additional :// segments).

Consider using urllib.parse.urlsplit / urlunsplit (or SQLAlchemy’s URL utilities) to parse and reconstruct the URL while replacing only the password component. That will make masking more robust and less sensitive to unexpected URL shapes.

Please review how many different DSN shapes you expect to support (e.g., with query params, special characters in passwords) and whether it’s worth moving to a parser-based approach now.

Makefile (3)

1-50: Fix stale .PHONY entry and consider adding standard targets

Two small Makefile polish points:

  • .PHONY lists clean-docker-dev, but there is no such target defined (you have clean-dev, clean-dev-containers, etc.). Either:

    • Rename the entry to clean-dev, or
    • Add a clean-docker-dev alias target that delegates to clean-dev.
  • Tools like checkmake are also flagging missing standard phony targets all and clean. If you care about those conventions, you could add simple wrappers:

all: help

clean: clean-dev

These changes will make make help output and static-analysis happier, but they’re not functionally required.


115-120: Tests re-running uv sync on every invocation may be slow

Both test and test-cov run uv sync --all-extras every time. For frequent local test runs this can add noticeable latency.

If your environment allows it, consider:

  • A separate bootstrap / deps target that runs uv sync, and
  • Making test/test-cov depend only on uv run pytest....

Example:

deps:
	cd $(BACKEND_DIR) && uv sync --all-extras

test: deps
	cd $(BACKEND_DIR) && uv run pytest

This keeps tests snappy once dependencies are up to date.


129-143: Improve shell portability in cleanup helpers

The cleanup targets work, but they rely on features that are not universally available in /bin/sh or on non-GNU systems:

  • clean-dev-images uses xargs -r, which is GNU-specific and will fail on macOS/BSD xargs.
  • clean-dev-volumes uses read -p, which is a Bash extension and not supported by all /bin/sh implementations (e.g., dash).

To make these targets more portable, you could:

  • Either declare SHELL := /bin/bash at the top of the Makefile and require Bash, and
  • Replace xargs -r with a more portable pattern (e.g., guarding with test -n or using xargs without -r but with || true), or use a small shell script.

These tweaks reduce surprising failures when contributors run cleanup commands on different platforms.

Please confirm which shells/OSes you intend to support for development so you can decide whether to enforce Bash explicitly or aim for POSIX sh + GNU utils.

backend/src/bookbytes/schemas/common.py (5)

24-31: Consider actually using BaseSchema as the base for shared response models

BaseSchema provides useful common config (from_attributes, populate_by_name, str_strip_whitespace), but all schemas in this file currently inherit directly from BaseModel.

If the intent is to have consistent behavior across API schemas, consider having at least the response-style models (ErrorDetail, ErrorResponse, PaginatedResponse, UUIDMixin/TimestampMixin/UUIDTimestampMixin, HealthStatus, HealthCheckResponse, MessageResponse) inherit from BaseSchema instead. That avoids repeating config and ensures ORM objects can be passed directly where appropriate.

Please verify how these schemas are instantiated elsewhere (e.g., from SQLAlchemy models) to confirm whether from_attributes=True is needed universally or only for a subset.


39-92: Error schema design is consistent and expressive

The ErrorDetail / ErrorResponse shapes, examples, and field descriptions are clear and aligned with problem-details-style error payloads. This should give clients a predictable error contract. No changes needed here; optional future enhancement would be introducing an Enum for code values once you have a stable set of well-known error codes.

If you later introduce an Enum or Literal type for code, please double-check how that interacts with your OpenAPI generator and client code generation.


100-158: Double-check pagination pages semantics for empty result sets

PaginationParams.offset/limit and the PaginatedResponse.create factory look correct, but note the behavior for total == 0:

pages = (total + size - 1) // size if total > 0 else 0

This yields pages = 0 when there are no items. Some APIs instead report pages = 0, others prefer 1 to indicate “one empty page”. Either is defensible, but it’s worth confirming that:

  • Your UI and client code expect pages=0 in the empty case, and
  • Any pagination controls handle that case cleanly.

If you want “always at least one page”, you could adjust to:

pages = max(1, (total + size - 1) // size)

Otherwise the current behavior is fine.

Please check how existing or planned frontends interpret pages=0 for empty collections before locking in this convention.


165-181: Clarify mixin role and consider aligning with BaseSchema

The Pydantic UUIDMixin / TimestampMixin / UUIDTimestampMixin nicely mirror your SQLAlchemy mixins conceptually, but a couple of small refinements may help:

  • Since these classes already inherit from BaseModel, they are full models, not “pure mixins”. That’s fine, but it means consuming schemas will typically use multiple inheritance, e.g. class BookOut(UUIDTimestampMixin, BaseSchema): .... Given that, you may want these mixins to extend BaseSchema instead of BaseModel to inherit the shared config automatically.
  • Their names are identical to the SQLAlchemy mixins in backend/src/bookbytes/models/base.py. To avoid confusion at call sites, consider using explicit import aliases (from ...models.base import UUIDMixin as SAUUIDMixin) or renaming these to something like UUIDSchemaMixin / TimestampSchemaMixin.

Functionally this is fine as-is; these are mainly clarity and reuse improvements.

Please verify how these mixins are composed in your actual response models to ensure MRO and model_config resolution behave as you expect under Pydantic v2.


189-224: Health schemas are flexible; you may want stronger typing for statuses

The health check schemas are practical and the examples match the field definitions. Two optional tweaks you might consider:

  • Replace the raw str + regex patterns with Literal[...] or small Enums for status values ("ok", "degraded", "error"). That improves type checking and auto-complete in clients.
  • For checks: dict[str, HealthStatus | str], decide whether you truly need both rich HealthStatus and bare strings in the same map. If not, standardizing on HealthStatus will simplify consumer logic.

None of this is required for correctness; the current design is workable.

If you introduce Literal or Enum types, please confirm compatibility with your OpenAPI tooling and any generated clients.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a5ac636 and bdca4a6.

⛔ Files ignored due to path filters (1)
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (53)
  • .agent/rules/pythonic-fastapi-async.md (1 hunks)
  • .dockerignore (1 hunks)
  • .env.example (0 hunks)
  • .github/copilot-instructions.md (1 hunks)
  • Makefile (1 hunks)
  • backend/alembic.ini (1 hunks)
  • backend/alembic/README (1 hunks)
  • backend/alembic/env.py (1 hunks)
  • backend/alembic/script.py.mako (1 hunks)
  • backend/alembic/versions/001_pg_idkit_extension.py (1 hunks)
  • backend/docker/Dockerfile (1 hunks)
  • backend/docker/Dockerfile.dev (1 hunks)
  • backend/docker/docker-compose.dev.yml (1 hunks)
  • backend/docker/docker-compose.yml (1 hunks)
  • backend/pyproject.toml (1 hunks)
  • backend/src/bookbytes/__init__.py (1 hunks)
  • backend/src/bookbytes/api/__init__.py (1 hunks)
  • backend/src/bookbytes/api/v1/__init__.py (1 hunks)
  • backend/src/bookbytes/config.py (1 hunks)
  • backend/src/bookbytes/core/__init__.py (1 hunks)
  • backend/src/bookbytes/core/database.py (1 hunks)
  • backend/src/bookbytes/core/exceptions.py (1 hunks)
  • backend/src/bookbytes/core/logging.py (1 hunks)
  • backend/src/bookbytes/dependencies.py (1 hunks)
  • backend/src/bookbytes/main.py (1 hunks)
  • backend/src/bookbytes/models/__init__.py (1 hunks)
  • backend/src/bookbytes/models/audio_book.py (1 hunks)
  • backend/src/bookbytes/models/base.py (1 hunks)
  • backend/src/bookbytes/models/book_provider.py (1 hunks)
  • backend/src/bookbytes/models/chapter.py (1 hunks)
  • backend/src/bookbytes/models/edition.py (1 hunks)
  • backend/src/bookbytes/models/work.py (1 hunks)
  • backend/src/bookbytes/repositories/__init__.py (1 hunks)
  • backend/src/bookbytes/repositories/base.py (1 hunks)
  • backend/src/bookbytes/schemas/__init__.py (1 hunks)
  • backend/src/bookbytes/schemas/common.py (1 hunks)
  • backend/src/bookbytes/services/__init__.py (1 hunks)
  • backend/src/bookbytes/storage/__init__.py (1 hunks)
  • backend/src/bookbytes/workers/__init__.py (1 hunks)
  • backend/tests/__init__.py (1 hunks)
  • backend/tests/conftest.py (1 hunks)
  • backend/tests/integration/__init__.py (1 hunks)
  • backend/tests/integration/test_database.py (1 hunks)
  • backend/tests/integration/test_health.py (1 hunks)
  • backend/tests/mocks/__init__.py (1 hunks)
  • backend/tests/mocks/openai_responses.py (1 hunks)
  • backend/tests/mocks/openlibrary_responses.py (1 hunks)
  • env.example (1 hunks)
  • tasks/backend-productionisation/audio-books-library/design-doc.md (1 hunks)
  • tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1 hunks)
  • tasks/backend-productionisation/prd-backend-productionisation.md (1 hunks)
  • tasks/backend-productionisation/tasks-backend-productionisation.md (1 hunks)
  • tasks/backend-productionisation/technical-considerations-backend-productionisation.md (1 hunks)
💤 Files with no reviewable changes (1)
  • .env.example
🧰 Additional context used
🧬 Code graph analysis (14)
backend/src/bookbytes/models/edition.py (4)
backend/src/bookbytes/models/base.py (4)
  • Base (24-30)
  • TimestampMixin (55-77)
  • UUIDPrimaryKeyMixin (33-52)
  • id (47-52)
backend/src/bookbytes/models/audio_book.py (1)
  • AudioBook (37-79)
backend/src/bookbytes/models/book_provider.py (1)
  • BookProvider (36-100)
backend/src/bookbytes/models/work.py (1)
  • Work (24-61)
backend/src/bookbytes/models/book_provider.py (3)
backend/src/bookbytes/models/base.py (3)
  • Base (24-30)
  • TimestampMixin (55-77)
  • UUIDPrimaryKeyMixin (33-52)
backend/src/bookbytes/models/edition.py (1)
  • Edition (23-80)
backend/src/bookbytes/models/work.py (1)
  • Work (24-61)
backend/src/bookbytes/models/audio_book.py (3)
backend/src/bookbytes/models/base.py (5)
  • Base (24-30)
  • SoftDeleteMixin (80-111)
  • TimestampMixin (55-77)
  • UUIDPrimaryKeyMixin (33-52)
  • id (47-52)
backend/src/bookbytes/models/chapter.py (1)
  • Chapter (21-66)
backend/src/bookbytes/models/edition.py (1)
  • Edition (23-80)
backend/src/bookbytes/dependencies.py (2)
backend/src/bookbytes/config.py (2)
  • Settings (53-254)
  • get_settings (258-267)
backend/src/bookbytes/core/database.py (1)
  • get_async_session (66-81)
backend/src/bookbytes/models/__init__.py (6)
backend/src/bookbytes/models/audio_book.py (2)
  • AudioBook (37-79)
  • AudioBookStatus (28-34)
backend/src/bookbytes/models/base.py (4)
  • Base (24-30)
  • SoftDeleteMixin (80-111)
  • TimestampMixin (55-77)
  • UUIDPrimaryKeyMixin (33-52)
backend/src/bookbytes/models/book_provider.py (2)
  • BookProvider (36-100)
  • BookProviderType (28-33)
backend/src/bookbytes/models/chapter.py (1)
  • Chapter (21-66)
backend/src/bookbytes/models/edition.py (1)
  • Edition (23-80)
backend/src/bookbytes/models/work.py (1)
  • Work (24-61)
backend/src/bookbytes/models/base.py (2)
backend/src/bookbytes/schemas/common.py (1)
  • TimestampMixin (171-175)
backend/src/bookbytes/repositories/base.py (1)
  • restore (309-321)
backend/src/bookbytes/core/logging.py (1)
backend/src/bookbytes/config.py (3)
  • Settings (53-254)
  • get_settings (258-267)
  • use_json_logs (244-246)
backend/tests/integration/test_database.py (5)
backend/tests/conftest.py (1)
  • async_client (62-91)
backend/src/bookbytes/config.py (1)
  • Settings (53-254)
backend/src/bookbytes/core/database.py (3)
  • close_db (134-146)
  • get_async_session (66-81)
  • init_db (84-131)
backend/src/bookbytes/models/base.py (2)
  • Base (24-30)
  • TimestampMixin (55-77)
backend/src/bookbytes/schemas/common.py (1)
  • TimestampMixin (171-175)
backend/src/bookbytes/repositories/base.py (2)
backend/src/bookbytes/models/base.py (4)
  • Base (24-30)
  • id (47-52)
  • deleted_at (90-96)
  • restore (109-111)
backend/src/bookbytes/schemas/common.py (3)
  • offset (112-114)
  • limit (117-119)
  • create (142-157)
backend/tests/integration/test_health.py (1)
backend/tests/conftest.py (1)
  • async_client (62-91)
backend/alembic/env.py (2)
backend/src/bookbytes/config.py (1)
  • get_settings (258-267)
backend/src/bookbytes/models/base.py (1)
  • Base (24-30)
backend/src/bookbytes/schemas/common.py (2)
backend/src/bookbytes/repositories/base.py (1)
  • create (122-134)
backend/src/bookbytes/models/base.py (4)
  • id (47-52)
  • TimestampMixin (55-77)
  • created_at (63-68)
  • updated_at (71-77)
backend/src/bookbytes/core/database.py (2)
backend/src/bookbytes/config.py (1)
  • Settings (53-254)
backend/src/bookbytes/core/logging.py (1)
  • get_logger (145-159)
backend/tests/conftest.py (5)
backend/src/bookbytes/config.py (1)
  • Settings (53-254)
backend/src/bookbytes/core/database.py (2)
  • close_db (134-146)
  • init_db (84-131)
backend/src/bookbytes/core/logging.py (1)
  • configure_logging (67-142)
backend/alembic/env.py (1)
  • get_url (35-38)
backend/src/bookbytes/repositories/base.py (3)
  • delete (361-370)
  • exists (192-206)
  • exists (338-358)
🪛 checkmake (0.2.2)
Makefile

[warning] 14-14: Target body for "help" exceeds allowed length of 5 (35).

(maxbodylength)


[warning] 1-1: Missing required phony target "all"

(minphony)


[warning] 1-1: Missing required phony target "clean"

(minphony)


[warning] 1-1: Missing required phony target "test"

(minphony)

🪛 Checkov (3.2.334)
backend/docker/docker-compose.yml

[medium] 29-30: Basic Auth Credentials

(CKV_SECRET_4)

backend/docker/docker-compose.dev.yml

[medium] 17-18: Basic Auth Credentials

(CKV_SECRET_4)

🪛 LanguageTool
.agent/rules/pythonic-fastapi-async.md

[grammar] ~15-~15: Ensure spelling is correct
Context: ...de duplication. Favour composition over inheritence. Smart core, Thin interfaces. Use descr...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~36-~36: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

tasks/backend-productionisation/prd-backend-productionisation.md

[grammar] ~1-~1: Ensure spelling is correct
Context: # PRD: Backend Productionisation Document Version: 1.1 Created: 4...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~59-~59: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...diately | | US2 | As an API consumer, I want to poll job status, so I know when process...

(REP_WANT_TO_VB)


[style] ~68-~68: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...gres, Redis | | DS2 | As a developer, I want to run tests with mocked external services...

(REP_WANT_TO_VB)


[style] ~69-~69: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... | | DS3 | As a developer, I want to add new features without touching unrel...

(REP_WANT_TO_VB)


[uncategorized] ~188-~188: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...e | | NG8 | Rate limiting implementation | Design only; im...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

tasks/backend-productionisation/tasks-backend-productionisation.md

[grammar] ~1-~1: Ensure spelling is correct
Context: # Tasks: Backend Productionisation PRD Reference: [prd-backend-production...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~106-~106: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ker/docker-compose.yml up` to start the full stack - Environment variables are validated a...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-library/design-doc.md

26-26: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


332-332: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


360-360: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


397-397: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


584-584: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

tasks/backend-productionisation/technical-considerations-backend-productionisation.md

62-62: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

.github/copilot-instructions.md

12-12: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

tasks/backend-productionisation/tasks-backend-productionisation.md

227-227: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


234-234: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

🔇 Additional comments (66)
backend/src/bookbytes/workers/__init__.py (1)

1-1: Looks good—simple and appropriate.

This is a minimal package initializer with a clear docstring. Nothing to change here.

tasks/backend-productionisation/tasks-backend-productionisation.md (2)

1-120: Excellent task tracker structure and comprehensive roadmap.

This document is well-organized and serves as a clear blueprint for backend productionization. The 7-phase structure is logical, task descriptions are actionable, and the practice of modularizing work (e.g., moving Phase 3 to a dedicated file) demonstrates good planning. Cross-references to related PRD and technical documentation are appropriate, and checkbox tracking enables progress visibility.


106-106: Use hyphen for compound adjective.

Line 106 uses "full stack" as a compound adjective modifying "up"; this should be hyphenated for clarity and grammar correctness.

Apply this diff:

- Use `docker-compose -f docker/docker-compose.yml up` to start the full stack
+ Use `docker-compose -f docker/docker-compose.yml up` to start the full-stack

Likely an incorrect or invalid review comment.

.dockerignore (1)

57-59: Defensive dual-case README inclusion is acceptable.

Both README.md and Readme.md are included to handle potential case-sensitivity across different systems. Since pyproject.toml specifies readme = "Readme.md", the lowercase variant is necessary. No changes needed.

backend/pyproject.toml (3)

115-120: Ruff configuration is pragmatic and well-reasoned.

The selection of rules and ignores is sensible for a FastAPI project:

  • Comprehensive error/warning coverage (E, W, F, I)
  • Code quality checks (C4, B, UP, SIM)
  • Intentional ignores: E501 for line length (handled by formatter), B008 for FastAPI's Depends(), ARG001 for common dependency patterns

125-134: Pytest configuration is solid for async-first testing.

asyncio_mode = "auto" is appropriate for pytest-asyncio with FastAPI, and test discovery patterns are standard. No concerns here.


144-150: mypy overrides for untyped packages are reasonable.

Excluding gtts, arq, and aioboto3 is a pragmatic choice given these packages lack complete type hints. The strict mode on the main codebase maintains type safety where it matters.

backend/docker/Dockerfile (1)

36-62: Excellent security and multi-stage practices.

The Dockerfile follows container security best practices:

  • Non-root user (bookbytes, UID/GID 1000) prevents privilege escalation
  • Lean runtime image via multi-stage build
  • Proper ownership of application directories
  • HEALTHCHECK provides Docker daemon visibility into app readiness
  • Environment variables well-structured
tasks/backend-productionisation/audio-books-library/design-doc.md (1)

1-890: Design document is comprehensive and technically sound.

The document provides thorough guidance on:

  • OpenLibrary API integration with rate limit awareness
  • Provider-agnostic architecture via BookProvider table (excellent pattern for multi-provider support)
  • Pragmatic cache strategy (Redis-only with AOF persistence, stale-while-revalidate)
  • Clear data model relationships and migrations strategy
  • Concrete API endpoint definitions
  • Phased implementation roadmap with success metrics

The architecture decisions are well-reasoned, particularly:

  1. Provider-agnostic models — decouples core data from external APIs
  2. BookProvider mapping table — enables multi-provider support without schema migration
  3. Redis-only caching — simplifies architecture while maintaining sufficient durability via AOF
  4. Stale-while-revalidate pattern — balances freshness and performance

Only minor markdown linting issues to address; content is production-ready.

backend/docker/docker-compose.yml (2)

29-40: Development credentials are acceptable, but ensure production secrets are managed externally.

The hardcoded database credentials (bookbytes:bookbytes) and dev-mode secrets (dev-api-key-12345, dev-secret-key-change-in-production) are fine for local development. The naming convention clearly signals these must be replaced in production.

For production deployments, consider:

  • Using Docker secrets or an external secrets manager
  • Documenting the required environment variables in a deployment guide

104-137: PostgreSQL and Redis configurations are well-structured.

Health checks, persistence via named volumes, and restart policies are properly configured. The pg_isready and redis-cli ping health checks are appropriate for dependency orchestration.

tasks/backend-productionisation/prd-backend-productionisation.md (1)

1-285: Comprehensive PRD with clear structure and requirements.

The document effectively outlines the backend productionization strategy with well-defined goals, user stories, and phased implementation. The technology choices (FastAPI, SQLAlchemy 2.0 async, ARQ, structlog) align with modern Python async best practices.

.github/copilot-instructions.md (1)

28-79: Well-documented coding conventions and patterns.

The instructions clearly document structlog usage, exception patterns, configuration access, and testing conventions. This provides excellent guidance for maintaining consistency across the codebase.

backend/src/bookbytes/models/base.py (1)

33-78: Well-designed mixins using SQLAlchemy 2.0 patterns.

The UUIDv7 choice provides excellent B-tree index performance with time-sortable, distributed-safe identifiers. The TimestampMixin correctly uses timezone-aware DateTime with server-side defaults.

backend/tests/mocks/openlibrary_responses.py (1)

127-160: Helper functions are well-designed for test flexibility.

Supporting both ISBN-10 and ISBN-13 lookups for the same books enables realistic test scenarios. The return type distinction (dict vs dict | None) appropriately differentiates between API responses and parsed metadata.

backend/src/bookbytes/repositories/base.py (2)

136-149: Sequential refresh in create_many may impact bulk performance.

The loop at lines 147-148 refreshes entities one-by-one after bulk insert. For large batches, this adds N round-trips to the database.

If bulk inserts with generated fields become a performance concern, consider:

  • Using RETURNING clause in bulk operations (PostgreSQL-specific)
  • Deferring refresh until entities are actually accessed

For now, this is acceptable as correctness is prioritized.


280-336: Solid soft-delete implementation with proper state checks.

The restore_by_id correctly fetches with include_deleted=True and validates the entity was actually deleted before restoring (line 333). This prevents spurious operations on non-deleted entities.

backend/src/bookbytes/core/exceptions.py (7)

1-15: LGTM! Clear documentation and minimal imports.

The module docstring provides excellent context for the exception hierarchy's purpose, and the import is appropriate for the type hints used throughout.


17-72: Excellent base exception design.

The BookBytesError base class provides a clean foundation with:

  • Class-level defaults for code, message, and status_code
  • Flexible constructor for overrides
  • Structured to_dict() serialization for API responses

The wrapping structure {"error": {...}} in to_dict() appears intentional for consistent API error responses.


79-201: Well-structured NotFound error hierarchy.

The five NotFoundError subclasses follow a consistent and clean pattern:

  • Accept optional identifiers for context
  • Build structured details payloads
  • Enhance messages with identifier information when available

The pattern ensures error responses are both human-readable and machine-parseable.


208-236: Correct HTTP semantics for auth errors.

The authentication/authorization error hierarchy properly distinguishes:

  • 401 (AuthenticationError) for authentication failures
  • 403 (AuthorizationError) for permission/authorization failures

Simple subclasses appropriately rely on class-level defaults without unnecessary constructor overrides.


243-283: Good validation error design with field context.

The ValidationError base class adds a field parameter to help identify which field failed validation. InvalidISBNError correctly leverages this by passing field="isbn" to the parent, ensuring structured error responses include field context.


290-324: Appropriate external service error handling.

Using HTTP 502 (Bad Gateway) for external service failures is correct. The four specific subclasses provide clear error codes for different service types (OpenAI, TTS, Metadata, Storage), enabling targeted error handling and monitoring.


331-391: Solid job error handling with useful context.

The job error hierarchy uses appropriate HTTP status codes:

  • 409 (Conflict) for duplicate job creation attempts
  • 500 (Internal Server Error) for processing failures

Both exceptions capture relevant context (job IDs, failed steps, errors) that will be valuable for debugging and monitoring job pipelines.

backend/alembic.ini (1)

1-146: Alembic configuration looks consistent with the project layout.

script_location and logging sections match the backend/alembic/ structure and async env.py approach; nothing blocking here.

backend/alembic/script.py.mako (1)

1-28: Template matches modern Alembic patterns; just be mindful of injected code indentation.

The revision metadata and upgrade/downgrade scaffolding are standard and type-annotated nicely. When generating migrations, ensure ${upgrades}/${downgrades} are rendered with appropriate leading indentation so multi-line operations align under the function body, but otherwise this looks good.

backend/alembic/versions/001_pg_idkit_extension.py (1)

1-34: Migration is straightforward; ensure operational readiness for pg_idkit.

The CREATE EXTENSION IF NOT EXISTS pg_idkit / DROP EXTENSION IF EXISTS pg_idkit pair is clean and idempotent. The main remaining concern is operational:

  • Confirm that your Postgres images/instances (including managed services) have pg_idkit available and that the migration user has permission to install/drop extensions, or document an alternative path for those environments.
backend/tests/mocks/openai_responses.py (1)

12-129: Mocks are comprehensive and well-structured for testing.

The module nicely covers:

  • Raw chat completion responses + parsed structures for chapter extraction.
  • A realistic summary response and per-chapter SAMPLE_SUMMARIES.
  • Common error shapes (rate limit, invalid key, context length) for negative-path tests.

This should make your OpenAI integration tests deterministic and expressive.

backend/src/bookbytes/models/book_provider.py (2)

28-34: LGTM!

The BookProviderType enum follows good practices: inherits from str, Enum for JSON serialization compatibility, uses lowercase values, and includes a forward-looking comment for future providers.


78-87: LGTM!

The relationships are correctly defined with explicit foreign_keys to disambiguate the multiple FKs to the same tables. The back_populates references align with the relationship definitions in Work and Edition models shown in the relevant snippets.

backend/tests/integration/test_database.py (2)

7-16: LGTM!

Good integration test verifying the /health/ready endpoint includes database connectivity checks. The assertions cover the essential response structure.


55-66: LGTM!

Simple but effective smoke test validating that the base classes and mixins expose the expected attributes. This catches import errors and basic structural issues early.

backend/tests/integration/test_health.py (3)

7-25: LGTM!

Good coverage of both liveness and readiness probes with appropriate assertions. Note: test_readiness_probe has some overlap with test_health_ready_checks_database in test_database.py, but they test different aspects (general structure vs. database-specific checks), so both are valuable.


28-38: LGTM!

Comprehensive test for the root endpoint, validating service metadata and navigation links. The assertions ensure API discoverability.


41-59: LGTM!

Good observability tests. Testing both auto-generated and custom request ID propagation is important for distributed tracing and debugging in production.

backend/src/bookbytes/models/chapter.py (2)

21-60: LGTM!

Well-designed Chapter model with:

  • Appropriate FK with cascade delete and index for parent lookups
  • UniqueConstraint ensuring chapter numbers are unique within an audiobook
  • Text type for summary (unbounded length)
  • Optional audio-related fields allowing chapters to exist before audio generation

The 1-indexed chapter_number convention is documented and the constraint naming is clear.


62-66: LGTM!

The __repr__ provides useful debugging information including the parent reference and identifying fields.

backend/src/bookbytes/models/__init__.py (2)

1-37: LGTM!

Clean package initialization with:

  • Explicit __all__ for public API surface
  • Organized imports with logical grouping
  • Helpful comments indicating phase/roadmap context

16-16: The Edition.book_providers relationship configuration is correct.

The current code properly sets back_populates="edition" in the Edition.book_providers relationship (line 74), which correctly references the edition attribute on the BookProvider model (line 83-87). The bidirectional relationship is properly configured.

backend/src/bookbytes/models/audio_book.py (3)

28-35: LGTM! Clean enum definition.

The AudioBookStatus enum inherits from both str and Enum, ensuring string serialization compatibility. The status values cover the complete audiobook generation lifecycle.


37-73: Consider adding a unique constraint on edition_id for one-to-one enforcement.

The docstring and relevant snippets indicate this is a one-to-one relationship with Edition (Edition has audio_book singular, not audio_books). However, the current schema allows multiple AudioBook rows per edition_id. If truly one-to-one, add a unique constraint:

 edition_id: Mapped[UUID] = mapped_column(
     ForeignKey("editions.id", ondelete="CASCADE"),
     nullable=False,
     index=True,
+    unique=True,
 )

If multiple audiobooks per edition is intentional (e.g., different versions), then the relationship name in Edition should be pluralized and the current setup is fine—please clarify the intent.


75-79: LGTM! Useful __repr__ for debugging.

The repr includes essential fields (id, edition_id, status, version) for effective debugging.

tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (2)

1-364: Well-structured task document with clear phase organization.

The phased approach with clear task decomposition, relevant file references, and notes section provides good project guidance. The decision to use Redis-only caching (removing api_cache.py) is clearly documented.


86-88: Task 3.0.4 is administrative cleanup—UUIDv7 is already implemented and operational.

UUIDv7 is already integrated into the codebase (backend/src/bookbytes/models/base.py) with the pg_idkit PostgreSQL extension installed via Alembic. The implementation is documented as time-sortable and is actively used by models. While explicit PostgreSQL sorting tests don't exist in test_database.py, the feature is already deployed. Mark this task complete or add a dedicated PostgreSQL sorting test if stricter verification is needed, but this is not a blocker for dependent features.

tasks/backend-productionisation/technical-considerations-backend-productionisation.md (2)

156-251: Database schema is well-designed with appropriate indexes and constraints.

Good use of:

  • UUID primary keys with gen_random_uuid()
  • Proper foreign key relationships with ON DELETE CASCADE/SET NULL
  • Indexes on frequently queried columns (email, isbn, book_id, status)
  • Check constraints (progress 0-100)
  • Unique constraints where appropriate

380-387: Breaking changes are clearly documented.

The migration notes properly call out breaking changes including API prefix changes, UUID-based book IDs, async processing, and mandatory authentication. This helps API consumers prepare for the migration.

backend/src/bookbytes/dependencies.py (3)

24-36: LGTM! Useful for accessing potentially modified settings.

This pattern correctly retrieves settings from request state, which may have been modified during app startup lifespan.


67-82: Stub implementations are acceptable for phased development.

The # type: ignore comments and None returns for unimplemented dependencies are appropriate placeholders. The TODO comments with phase references provide clear traceability.

Also applies to: 88-102, 109-147, 153-178


42-61: Double commit/rollback: get_async_session already handles transactions.

Per the relevant snippet from database.py, get_async_session() already commits on success and rolls back on exception. Wrapping it again here causes:

  1. Double commit on success
  2. Potential issues if the inner generator already committed before outer tries to commit

Simplify to just yield the session:

 async def get_db_session() -> AsyncGenerator[AsyncSession, None]:
-    """Get an async database session.
-
-    Yields a database session that automatically handles
-    commit on success and rollback on exception.
-
-    Yields:
-        AsyncSession: Database session
-    """
-
+    """Get an async database session."""
     from bookbytes.core.database import get_async_session
 
     async for session in get_async_session():
-        try:
-            yield session
-            await session.commit()
-        except Exception:
-            await session.rollback()
-            raise
+        yield session

Or, if you want transaction control at the dependency level, remove it from get_async_session() and keep it only here.

Likely an incorrect or invalid review comment.

backend/src/bookbytes/models/work.py (3)

41-41: Verify default=list behavior with SQLAlchemy.

Using default=list (a callable) is correct and safe here—SQLAlchemy will call list() for each new instance, avoiding the mutable default argument pitfall. Good practice.


47-58: Relationship configurations are well-designed.

  • lazy="selectin" on editions prevents N+1 queries by batch-loading related editions
  • cascade="all, delete-orphan" ensures proper cleanup when a Work is deleted
  • Explicit foreign_keys on book_providers resolves potential ambiguity in polymorphic relationships

24-61: LGTM! Clean, well-documented model.

The Work model follows established patterns from the base mixins, has appropriate field definitions, and clear docstrings explaining the domain concept. The provider-agnostic design via separate BookProvider table is a good architectural choice.

backend/tests/conftest.py (2)

28-47: LGTM! Well-structured test settings fixture.

The test settings appropriately override production defaults with test-specific values including in-memory SQLite, test API key, and console logging format.


147-218: LGTM! Well-designed mock service fixtures.

The mock fixtures provide sensible default return values and use AsyncMock appropriately for async methods. This enables isolated unit testing without external dependencies.

backend/alembic/env.py (2)

35-38: LGTM! Clean database URL resolution.

Properly delegates to the centralized settings configuration.


72-90: LGTM! Standard async Alembic pattern.

The async migration runner correctly creates an engine with NullPool (appropriate for one-shot migration scripts), runs migrations synchronously within the async connection context, and properly disposes the engine afterward.

backend/src/bookbytes/core/logging.py (2)

31-46: LGTM! Clean correlation ID management.

The ContextVar-based approach ensures proper isolation of correlation IDs across async request contexts.


145-159: LGTM! Simple logger factory.

Properly wraps structlog's get_logger with clear documentation.

backend/src/bookbytes/main.py (2)

156-210: LGTM! Well-implemented request logging middleware.

The middleware properly generates/propagates correlation IDs, logs request lifecycle with timing, handles exceptions, and cleans up the context variable in finally.


301-319: LGTM! Readiness probe with dependency checks.

The readiness endpoint correctly checks database connectivity and returns structured health status. The TODO for Redis is appropriately noted.

backend/src/bookbytes/config.py (2)

14-51: LGTM! Well-defined configuration enums.

The enums provide type safety and clear documentation for configuration options.


257-267: LGTM! Properly cached settings factory.

The @lru_cache decorator ensures settings are only read once from environment variables, improving performance for repeated access.

backend/src/bookbytes/core/database.py (3)

39-63: Engine and session factory accessors are clear and safe

The global accessors correctly guard against use before initialization with explicit RuntimeErrors, which is helpful for catching miswired startup. No changes needed here.


66-82: Session dependency pattern looks good

The get_async_session implementation matches the recommended FastAPI/SQLAlchemy async pattern: one session per request, automatic close via the context manager, and rollback on exceptions. This is solid as-is.


134-147: Shutdown path is idempotent and correct

close_db safely disposes the engine, clears globals, and logs both before and after, while being a no-op if the engine was never initialized. This is straightforward and robust.

backend/src/bookbytes/schemas/common.py (1)

232-242: MessageResponse is simple and sufficient

The MessageResponse wrapper with example metadata is straightforward and aligns with common “ack”/confirmation response patterns. Looks good as-is.

Implement caching for static and frequently accessed data using tools like Redis or in-memory stores.
Optimize data serialization and deserialization with Pydantic.
Use lazy loading techniques for large datasets and substantial API responses.
Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix SQLAlchemy typo.

Line 36 contains a spelling error: "SQLAlachemy" should be "SQLAlchemy".

Apply this correction:

- Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
+ Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices.
Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
🧰 Tools
🪛 LanguageTool

[grammar] ~36-~36: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In .agent/rules/pythonic-fastapi-async.md around line 36, fix the typo
"SQLAlachemy" to the correct spelling "SQLAlchemy" so the sentence reads
referencing FastAPI, Pydantic, SQLAlchemy, other library documentation for Data
Models, Schemas, Path Operations, Middleware and best practices.

version = "0.1.0"
description = "AI-powered book summarization and audio generation service"
readme = "Readme.md"
requires-python = ">=3.13"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd backend && head -150 pyproject.toml | cat -n

Repository: cryptus-neoxys/bookbytes

Length of output: 4280


Fix Python version mismatch between project requirement and tooling config.

The project explicitly requires Python >=3.13 (line 10) and lists Python 3.13 in classifiers (line 22), but Ruff (line 99) and mypy (line 137) are configured for Python 3.11. This creates a discrepancy where the linters may not properly validate code written for Python 3.13, including syntax or feature availability differences between these versions.

Update the tooling targets to match:

-target-version = "py311"
+target-version = "py313"

-python_version = "3.11"
+python_version = "3.13"

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/pyproject.toml around lines 10, 99, and 137 update tooling to match
the declared requires-python >=3.13: change any ruff target-version value to the
Python 3.13 token (e.g., "py313"), set mypy's python_version to "3.13" (or
equivalent toml value), and ensure classifiers (around line 22) list Python ::
3.13 so all project metadata and tooling targets are consistent.

Comment on lines +168 to +172
s3_url_expiry_seconds: int = Field(
default=0,
ge=0,
description="S3 pre-signed URL expiry in seconds (0 = no expiry/public)",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

AWS S3 presigned URL expiry 0 seconds behavior boto3

💡 Result:

Short answer: ExpiresIn=0 makes the presigned URL effectively expired immediately (or set to the current time) and thus unusable. In practice very small values (including 0) will fail due to immediate expiration, clock skew, or because the URL is also bounded by the lifetime of the credentials used to sign it (temporary STS/role credentials can expire much sooner). Use at least a 1+ second expiry and ensure the signing credentials have a longer lifetime and system clocks are synced. [1][2][3][4]

Sources:

  • Boto3 generate_presigned_url docs (ExpiresIn is seconds). [1]
  • AWS S3 presigned-URL doc (URL valid for specified period; also limited by credential lifetime). [2]
  • Community answers showing premature expiry caused by temporary credentials and recommending non-temporary IAM credentials for longer windows. [3]
  • Reports of very small expirations failing (clock skew / immediate expiry). [4]

The S3 pre-signed URL expiry field description is semantically incorrect.

AWS S3 pre-signed URLs with ExpiresIn=0 become immediately expired and unusable, not "public" or without expiry. The field description "0 = no expiry/public" is misleading and does not reflect AWS behavior. Either enforce a minimum positive value (e.g., gt=0) or update the description to accurately document that 0 results in an immediately expired URL.

🤖 Prompt for AI Agents
In backend/src/bookbytes/config.py around lines 168 to 172 the Field for
s3_url_expiry_seconds has an incorrect description ("0 = no expiry/public") that
contradicts AWS behavior; update the code to either enforce a positive expiry or
document 0 correctly: either change the Field validator to ge>0 (e.g., gt=0) and
update the description to something like "S3 pre-signed URL expiry in seconds
(must be >0)" or keep ge=0 but change the description to "S3 pre-signed URL
expiry in seconds (0 = immediately expired/unusable); use a positive value for a
valid expiry"; pick one approach and make the corresponding change to the Field
constraints and description so the behavior is accurately represented.

Comment on lines +248 to +254
@field_validator("jwt_secret_key")
@classmethod
def validate_jwt_secret(cls, v: SecretStr) -> SecretStr:
"""Warn if using default JWT secret in non-dev environment."""
# This is a soft validation - we log a warning but don't fail
# The actual check happens at runtime in production
return v
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No-op validator with misleading docstring.

The validate_jwt_secret validator claims to "warn if using default JWT secret" but performs no validation or logging—it just returns the value unchanged.

Either implement the warning or remove the misleading docstring:

     @field_validator("jwt_secret_key")
     @classmethod
-    def validate_jwt_secret(cls, v: SecretStr) -> SecretStr:
-        """Warn if using default JWT secret in non-dev environment."""
-        # This is a soft validation - we log a warning but don't fail
-        # The actual check happens at runtime in production
+    def validate_jwt_secret(cls, v: SecretStr, info: ValidationInfo) -> SecretStr:
+        """Warn if using default JWT secret."""
+        import warnings
+        default_secret = "dev-secret-key-change-in-production-!!!"
+        if v.get_secret_value() == default_secret:
+            warnings.warn(
+                "Using default JWT secret key. Set JWT_SECRET_KEY env var in production.",
+                UserWarning,
+                stacklevel=2,
+            )
         return v

Note: You'll need to import ValidationInfo from pydantic.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/src/bookbytes/config.py around lines 248 to 254, the validator
validate_jwt_secret is currently a no-op despite its docstring; update it to
accept a second parameter info: ValidationInfo (import ValidationInfo from
pydantic) and implement logic to detect the default secret (compare
v.get_secret_value() to your DEFAULT_JWT_SECRET or hardcoded default), and if
running in a non-development environment (check an env var like ENV or
settings), call logger.warning with a clear message including context (don't
raise); otherwise simply return v as SecretStr. If you prefer not to warn,
remove/adjust the docstring and the validator decorator accordingly so it isn’t
misleading.

Comment on lines +61 to +91
@pytest.fixture
async def async_client(
app: FastAPI, test_settings: Settings
) -> AsyncGenerator[AsyncClient, None]:
"""Create an async HTTP client for testing.

This client makes requests to the test app without starting a server.
The app lifespan is triggered to initialize database and other resources.

Usage:
async def test_endpoint(async_client: AsyncClient):
response = await async_client.get("/health/live")
assert response.status_code == 200
"""
from contextlib import asynccontextmanager

from bookbytes.core.database import close_db, init_db
from bookbytes.core.logging import configure_logging

# Initialize resources that lifespan would normally handle
configure_logging(test_settings)
await init_db(test_settings)

async with AsyncClient(
transport=ASGITransport(app=app),
base_url="http://test",
) as client:
yield client

# Cleanup
await close_db()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing table creation for in-memory SQLite.

The async_client fixture initializes the database engine but doesn't create the schema tables. In-memory SQLite databases start empty, so tests requiring database access will fail with "table not found" errors.

     from bookbytes.core.database import close_db, init_db
     from bookbytes.core.logging import configure_logging
+    from bookbytes.models import Base
+    from bookbytes.core.database import get_engine
 
     # Initialize resources that lifespan would normally handle
     configure_logging(test_settings)
     await init_db(test_settings)
 
+    # Create tables for in-memory SQLite
+    engine = get_engine()
+    async with engine.begin() as conn:
+        await conn.run_sync(Base.metadata.create_all)
+
     async with AsyncClient(

Comment on lines +94 to +112
@pytest.fixture
async def authenticated_client(
app: FastAPI, test_settings: Settings
) -> AsyncGenerator[AsyncClient, None]:
"""Create an authenticated async client using API key.

This client automatically includes the X-API-Key header.

Usage:
async def test_protected_endpoint(authenticated_client: AsyncClient):
response = await authenticated_client.get("/api/v1/books")
assert response.status_code == 200
"""
async with AsyncClient(
transport=ASGITransport(app=app),
base_url="http://test",
headers={"X-API-Key": test_settings.api_key.get_secret_value()},
) as client:
yield client
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

authenticated_client skips database initialization.

Unlike async_client, this fixture doesn't call init_db() or configure_logging(). Tests using authenticated_client will fail if they hit endpoints that require database access.

Consider either making authenticated_client depend on async_client or duplicating the initialization logic.

 @pytest.fixture
 async def authenticated_client(
-    app: FastAPI, test_settings: Settings
+    async_client: AsyncClient, test_settings: Settings, app: FastAPI
 ) -> AsyncGenerator[AsyncClient, None]:
-    """Create an authenticated async client using API key.
-
-    This client automatically includes the X-API-Key header.
-
-    Usage:
-        async def test_protected_endpoint(authenticated_client: AsyncClient):
-            response = await authenticated_client.get("/api/v1/books")
-            assert response.status_code == 200
-    """
+    """Create an authenticated async client using API key."""
+    # Reuse async_client's initialized resources
     async with AsyncClient(
         transport=ASGITransport(app=app),
         base_url="http://test",
         headers={"X-API-Key": test_settings.api_key.get_secret_value()},
     ) as client:
         yield client

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/tests/conftest.py around lines 94 to 112, the authenticated_client
fixture creates an AsyncClient with an API key header but does not perform
database initialization or logging configuration like async_client does, causing
DB-dependent tests to fail; modify authenticated_client to reuse the
async_client fixture (add async_client: AsyncClient to its signature) and set
the X-API-Key header on that provided client before yielding, or alternatively
call the same init_db() and configure_logging() steps before creating the
AsyncClient so the database and logging are initialized the same way as
async_client.

Comment on lines +137 to +138
| ----- | ---------------------------------------------------------------------------- | --- |
| FR5.1 | Create abstract `StorageBackend` interface in `storage/base.py` |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor table formatting issue.

There's a stray pipe character causing malformed table syntax:

-| FR5.4 | Add storage backend selection via config: `STORAGE_BACKEND=local             | s3` |
+| FR5.4 | Add storage backend selection via config: `STORAGE_BACKEND=local \| s3`       |

The pipe inside the cell content needs escaping or the cell structure should be adjusted.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tasks/backend-productionisation/prd-backend-productionisation.md around lines
137 to 138, the table row for FR5.1 contains an unescaped pipe character inside
the second cell which breaks the Markdown table; remove or escape the pipe
(e.g., replace with \|) or restructure the cell content (wrap text differently
or split into additional columns) so the table syntax is valid and the row
renders correctly.

- [ ] 7.0 **Phase 7: Observability & Deployment**
- [x] 7.1 Create `src/bookbytes/core/logging.py` with structlog configuration: Configure `structlog.configure()` with processors for JSON output (prod) or console (dev), add timestamp, log level, logger name _(DONE - moved to auxiliary foundation)_
- [x] 7.2 Add correlation ID processor in logging: Extract `request_id` from context, add to all log entries _(DONE - moved to auxiliary foundation)_
- [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix markdown formatting: remove spaces inside emphasis markers.

Lines 227 and 234 have spaces around the emphasis markers that should be removed per markdown lint standards.

Apply this diff:

-  - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_
+  - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)*
-  - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)\_
+  - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)*

Also applies to: 234-234

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

227-227: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

🤖 Prompt for AI Agents
In tasks/backend-productionisation/tasks-backend-productionisation.md around
lines 227 and 234, the markdown uses emphasis markers with spaces inside (e.g.
`*(DONE - moved to auxiliary foundation)\_` and similar); remove the spaces
between the emphasis markers and the text so the emphasis/italic syntax is
correct (e.g. change `*(...)\_` to `*(...)\_` without internal spaces or use
`_text_`/`*text*` properly), ensuring no extra spaces between `*` or `_` and the
emphasized text on both lines.

Comment on lines +410 to +412
- AUTH_MODE=api_key
- API_KEY=dev-api-key-12345
- JWT_SECRET_KEY=${JWT_SECRET_KEY:-dev-secret-key-change-in-prod}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid default fallback for JWT_SECRET_KEY in Docker Compose.

The fallback dev-secret-key-change-in-prod creates a security risk—if the environment variable is unset, the application silently uses a weak, predictable secret. Consider failing fast instead:

-      - JWT_SECRET_KEY=${JWT_SECRET_KEY:-dev-secret-key-change-in-prod}
+      - JWT_SECRET_KEY=${JWT_SECRET_KEY:?JWT_SECRET_KEY must be set}

Alternatively, document clearly that this default is only for local development and must never be used in production.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- AUTH_MODE=api_key
- API_KEY=dev-api-key-12345
- JWT_SECRET_KEY=${JWT_SECRET_KEY:-dev-secret-key-change-in-prod}
- AUTH_MODE=api_key
- API_KEY=dev-api-key-12345
- JWT_SECRET_KEY=${JWT_SECRET_KEY:?JWT_SECRET_KEY must be set}
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/technical-considerations-backend-productionisation.md
around lines 410 to 412, the Docker Compose snippet provides a default fallback
JWT_SECRET_KEY ("dev-secret-key-change-in-prod") which is insecure; remove the
default fallback so the service fails fast when JWT_SECRET_KEY is not set (or
replace it with an explicit placeholder and a clear comment requiring the env be
provided in production), and update the text to state that a non-empty
JWT_SECRET_KEY is mandatory in production and must not rely on any development
default.

…onical models

- Add OpenLibrary API configuration to settings
- Implement OpenLibraryService with search and work details methods
- Define canonical DTOs for provider-agnostic data representation
- Integrate with CacheService for response caching
- Add FastAPI dependency injection support
- Update task documentation to reflect completion
Add research document outlining canonical data model and anti-corruption layer patterns for future multi-provider integration phase in BookBytes Audio Books Library. Includes cache strategy and trade-offs analysis.
… services

test(cache): add unit tests for cache key generation, TTL handling, and operations
test(openlibrary): add unit tests for search, work details, and error handling
test(integration): add integration tests for openlibrary API with real network calls
docs: update test markers in pyproject.toml for better test organization
Add API v1 router with search endpoints for books, works and ISBN lookup
Include schemas for search requests/responses and unit tests
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (4)
backend/pyproject.toml (1)

99-99: Fix Python version mismatch between project requirement and tooling config.

The project explicitly requires Python >=3.13 (line 10) and lists Python 3.13 in classifiers (line 22), but Ruff (line 99) and mypy (line 142) are configured for Python 3.11. This creates a discrepancy where the linters may not properly validate code written for Python 3.13, including syntax or feature availability differences between these versions.

Update the tooling targets to match:

-target-version = "py311"
+target-version = "py313"

-python_version = "3.11"
+python_version = "3.13"

Also applies to: 142-142

backend/src/bookbytes/config.py (2)

151-172: Correct S3 pre-signed URL expiry semantics.

description="S3 pre-signed URL expiry in seconds (0 = no expiry/public)" is incorrect for AWS S3: an expiry of 0 makes the URL effectively immediately expired, not public or non-expiring. Either:

  • Enforce positive expiry and update description, e.g.:
-    s3_url_expiry_seconds: int = Field(
-        default=0,
-        ge=0,
-        description="S3 pre-signed URL expiry in seconds (0 = no expiry/public)",
-    )
+    s3_url_expiry_seconds: int = Field(
+        default=3600,
+        ge=1,
+        description="S3 pre-signed URL expiry in seconds (must be > 0)",
+    )

or keep ge=0 but document that 0 means “immediately expired/unusable” and ensure calling code treats 0 as a special case (e.g., don’t generate a pre-signed URL at all).


265-271: validate_jwt_secret is still a no-op despite the docstring.

The validator claims to “Warn if using default JWT secret in non-dev environment” but currently just returns v. Either implement the warning (e.g., compare to the default and log/emit a warning when app_env != DEVELOPMENT) or remove/relax the docstring and validator to avoid implying checks that don’t exist.

.agent/rules/pythonic-fastapi-async.md (1)

14-37: Fix typos in FastAPI/Python guidance.

A few spelling issues to clean up:

  • Line 16: inheritenceinheritance
  • Line 37: SQLAlachemySQLAlchemy
  • Line 37: ShemasSchemas
  • Line 26: basemodelBaseModel

For example:

- Favour composition over inheritence. Smart core, Thin interfaces.
+ Favour composition over inheritance. Smart core, Thin interfaces.
- Use functional components (plain functions) and Pydantic models/basemodel for consistent input/output validation and response schemas.
+ Use functional components (plain functions) and Pydantic models/BaseModel for consistent input/output validation and response schemas.
- Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices.
+ Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
🧹 Nitpick comments (10)
backend/pyproject.toml (1)

26-72: Consider pinning dependency versions for reproducibility.

All dependencies are unversioned (using latest available), which can lead to non-reproducible builds if breaking changes are introduced in transitive dependencies. For a production service, consider pinning at least major versions or using a lock file strategy (e.g., pip-compile or Poetry's lock file).

Example:

-    "fastapi",
-    "uvicorn[standard]",
+    "fastapi>=0.104.0",
+    "uvicorn[standard]>=0.24.0",

This is especially important for security-critical packages like passlib, python-jose, and database drivers.

backend/src/bookbytes/repositories/work.py (1)

30-36: Clarify author matching semantics in get_by_title_author.

The comment says "author overlap", but Work.authors.contains(authors) requires the stored JSON array to contain the full authors list (subset match), not just any overlap. If you intend “any matching author”, consider either:

  • Changing the query (e.g., or_ over individual author contains checks), or
  • Updating the docstring/comment to reflect the stricter subset requirement.
backend/alembic/versions/80de24a1ee9e_add_audio_books_library_models.py (1)

14-18: Verify down_revision and modernize type hints.

  • down_revision = None makes this the base migration. If you already have earlier Alembic revisions, this will create a second head; update down_revision to the previous revision ID instead.
  • Minor style nit: you can follow current Python/ruff guidance by importing Sequence from collections.abc and using | unions:
-from typing import Sequence, Union
+from collections.abc import Sequence
+
+from typing import Union  # or drop entirely if you switch to `|` unions

-down_revision: Union[str, Sequence[str], None] = None
-branch_labels: Union[str, Sequence[str], None] = None
-depends_on: Union[str, Sequence[str], None] = None
+down_revision: str | Sequence[str] | None = None
+branch_labels: str | Sequence[str] | None = None
+depends_on: str | Sequence[str] | None = None
backend/tests/unit/test_cache_service.py (1)

19-28: Verify scan_iter mock type matches async usage in invalidate_pattern.

The fixture mocks scan_iter as a synchronous iterator (iter([])), but in test_invalidate_pattern (lines 263-267), it's replaced with an async generator. This default mock would cause issues if any test relies on it directly without overriding.

Consider making the default mock an async iterator for consistency:

 @pytest.fixture
 def mock_redis() -> MagicMock:
     """Create a mock Redis client."""
     redis = MagicMock()
     redis.get = AsyncMock(return_value=None)
     redis.setex = AsyncMock(return_value=True)
     redis.delete = AsyncMock(return_value=1)
     redis.ttl = AsyncMock(return_value=3600)
-    redis.scan_iter = MagicMock(return_value=iter([]))
+    async def empty_scan_iter(match: str):
+        return
+        yield  # Makes this an async generator
+    redis.scan_iter = empty_scan_iter
     return redis
backend/tests/unit/test_openlibrary_service.py (1)

170-180: Consider initializing results directly in the constructor.

The mutation of response.results after construction (line 179) works but is less idiomatic. Consider initializing directly:

     def test_has_more_true(self) -> None:
         """Test has_more when more results exist."""
         response = SearchResponse(
-            results=[],
+            results=[MagicMock()] * 10,
             total_found=100,
             offset=0,
             limit=10,
         )
-        # At offset 0 with 10 results, 100 total -> has more
-        response.results = [MagicMock()] * 10
         assert response.has_more is True
backend/src/bookbytes/repositories/book_provider.py (1)

71-98: Callers should handle IntegrityError for duplicate provider keys.

The UniqueConstraint on (provider, external_key) will raise IntegrityError if the mapping already exists. Consider either:

  1. Documenting this in the docstring
  2. Using provider_key_exists() check before calling, or
  3. Handling the exception at the service layer

This is a design choice, so just flagging for awareness.

backend/tests/integration/test_openlibrary_api.py (1)

283-295: test_cache_key_is_deterministic doesn't require real API and should be a unit test.

This test only exercises CacheService.search_key() static method and doesn't interact with the OpenLibrary API at all. Consider moving it to unit tests:

-@pytest.mark.integration
-@pytest.mark.external
-class TestOpenLibraryCacheIntegration:
+# In unit tests file
+class TestCacheServiceKeys:
     ...
-    @pytest.mark.asyncio
-    async def test_cache_key_is_deterministic(
-        self, openlibrary_service: OpenLibraryService
-    ) -> None:
+    def test_cache_key_is_deterministic(self) -> None:
         """Test that same search params produce same cache key."""
         key1 = CacheService.search_key(title="Python", author="Guido")
         key2 = CacheService.search_key(title="Python", author="Guido")
         key3 = CacheService.search_key(title="python", author="guido")

         assert key1 == key2
         assert key1 == key3  # Normalized
-
-        await openlibrary_service.close()
backend/src/bookbytes/services/cache.py (2)

134-154: Consider batching deletes in invalidate_pattern for better performance.

Deleting keys one-by-one in a loop can be slow for large key sets. Consider batching with a pipeline:

     async def invalidate_pattern(self, pattern: str) -> int:
         try:
             count = 0
-            async for key in self.redis.scan_iter(match=pattern):
-                await self.redis.delete(key)
-                count += 1
+            batch = []
+            async for key in self.redis.scan_iter(match=pattern):
+                batch.append(key)
+                if len(batch) >= 100:
+                    await self.redis.delete(*batch)
+                    count += len(batch)
+                    batch = []
+            if batch:
+                await self.redis.delete(*batch)
+                count += len(batch)
             logger.debug("cache_pattern_invalidated", pattern=pattern, count=count)
             return count

This is optional if you don't expect large invalidation sets.


231-242: Consider normalizing ISBN input for consistent cache hits.

ISBNs can be formatted with or without hyphens (e.g., 978-0-13-468599-1 vs 9780134685991). Without normalization, the same book could have multiple cache entries:

     @staticmethod
     def isbn_key(isbn: str) -> str:
-        return f"isbn:{isbn}"
+        # Remove hyphens and spaces for consistent keys
+        normalized = isbn.replace("-", "").replace(" ", "").strip()
+        return f"isbn:{normalized}"
backend/src/bookbytes/services/openlibrary.py (1)

430-440: Work details parsing returns empty data for several fields.

The _parse_work_response method returns empty/null values for authors, isbn_list, edition_count, and first_publish_year with comments indicating these need separate API calls. This is fine as a simplification, but consider adding a TODO or tracking issue for completing this functionality.

authors=[],  # Need separate author fetch - simplified for now

Do you want me to open an issue to track implementing the author/edition data fetching?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdca4a6 and 4824f61.

📒 Files selected for processing (19)
  • .agent/rules/pythonic-fastapi-async.md (1 hunks)
  • backend/alembic/versions/80de24a1ee9e_add_audio_books_library_models.py (1 hunks)
  • backend/pyproject.toml (1 hunks)
  • backend/src/bookbytes/config.py (1 hunks)
  • backend/src/bookbytes/repositories/__init__.py (1 hunks)
  • backend/src/bookbytes/repositories/audio_book.py (1 hunks)
  • backend/src/bookbytes/repositories/book_provider.py (1 hunks)
  • backend/src/bookbytes/repositories/chapter.py (1 hunks)
  • backend/src/bookbytes/repositories/edition.py (1 hunks)
  • backend/src/bookbytes/repositories/work.py (1 hunks)
  • backend/src/bookbytes/services/__init__.py (1 hunks)
  • backend/src/bookbytes/services/cache.py (1 hunks)
  • backend/src/bookbytes/services/openlibrary.py (1 hunks)
  • backend/tests/integration/test_openlibrary_api.py (1 hunks)
  • backend/tests/unit/__init__.py (1 hunks)
  • backend/tests/unit/test_cache_service.py (1 hunks)
  • backend/tests/unit/test_openlibrary_service.py (1 hunks)
  • tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1 hunks)
  • tasks/knowledge/multi-provider-integration-patterns.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • backend/tests/unit/init.py
🧰 Additional context used
🧬 Code graph analysis (9)
backend/src/bookbytes/repositories/book_provider.py (2)
backend/src/bookbytes/models/book_provider.py (1)
  • BookProvider (36-100)
backend/src/bookbytes/repositories/base.py (1)
  • BaseRepository (43-206)
backend/src/bookbytes/services/__init__.py (2)
backend/src/bookbytes/services/cache.py (3)
  • CacheService (33-253)
  • get_cache_service (277-289)
  • set_redis_client (260-274)
backend/src/bookbytes/services/openlibrary.py (8)
  • BookSearchResult (28-76)
  • OpenLibraryError (151-154)
  • OpenLibraryRateLimitError (157-160)
  • OpenLibraryService (163-458)
  • SearchResponse (132-143)
  • WorkDetails (80-128)
  • get_openlibrary_service (474-480)
  • set_openlibrary_service (468-471)
backend/tests/unit/test_cache_service.py (2)
backend/src/bookbytes/services/cache.py (9)
  • CacheService (33-253)
  • get (69-96)
  • search_key (191-229)
  • isbn_key (232-241)
  • work_key (244-253)
  • _jitter_ttl (156-166)
  • _get_original_ttl (168-184)
  • invalidate (122-132)
  • invalidate_pattern (134-154)
backend/src/bookbytes/repositories/base.py (1)
  • delete (361-370)
backend/src/bookbytes/repositories/chapter.py (2)
backend/src/bookbytes/repositories/base.py (2)
  • BaseRepository (43-206)
  • delete (361-370)
backend/src/bookbytes/schemas/common.py (2)
  • offset (112-114)
  • limit (117-119)
backend/tests/unit/test_openlibrary_service.py (4)
backend/src/bookbytes/services/openlibrary.py (13)
  • BookSearchResult (28-76)
  • OpenLibraryError (151-154)
  • OpenLibraryRateLimitError (157-160)
  • OpenLibraryService (163-458)
  • SearchResponse (132-143)
  • WorkDetails (80-128)
  • has_more (141-143)
  • search_books (209-259)
  • get_work_details (261-288)
  • get_all_isbns_for_work (290-300)
  • _user_agent (189-191)
  • _parse_search_response (384-412)
  • _parse_work_response (414-440)
backend/tests/integration/test_openlibrary_api.py (2)
  • mock_cache (30-40)
  • openlibrary_service (44-46)
backend/src/bookbytes/services/cache.py (1)
  • get (69-96)
backend/src/bookbytes/schemas/common.py (2)
  • offset (112-114)
  • limit (117-119)
backend/src/bookbytes/repositories/edition.py (4)
backend/src/bookbytes/models/edition.py (1)
  • Edition (23-80)
backend/src/bookbytes/repositories/base.py (1)
  • BaseRepository (43-206)
backend/src/bookbytes/schemas/common.py (2)
  • offset (112-114)
  • limit (117-119)
backend/src/bookbytes/models/base.py (1)
  • id (47-52)
backend/src/bookbytes/repositories/__init__.py (6)
backend/src/bookbytes/repositories/audio_book.py (1)
  • AudioBookRepository (15-151)
backend/src/bookbytes/repositories/base.py (2)
  • BaseRepository (43-206)
  • SoftDeleteRepository (209-381)
backend/src/bookbytes/repositories/book_provider.py (1)
  • BookProviderRepository (15-149)
backend/src/bookbytes/repositories/chapter.py (1)
  • ChapterRepository (14-115)
backend/src/bookbytes/repositories/edition.py (1)
  • EditionRepository (14-90)
backend/src/bookbytes/repositories/work.py (1)
  • WorkRepository (13-75)
backend/src/bookbytes/repositories/work.py (2)
backend/src/bookbytes/models/work.py (1)
  • Work (24-61)
backend/src/bookbytes/repositories/base.py (1)
  • BaseRepository (43-206)
backend/src/bookbytes/services/openlibrary.py (1)
backend/src/bookbytes/config.py (1)
  • get_settings (275-284)
🪛 LanguageTool
.agent/rules/pythonic-fastapi-async.md

[grammar] ~16-~16: Ensure spelling is correct
Context: ...de duplication. Favour composition over inheritence. Smart core, Thin interfaces. Use descr...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~37-~37: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
tasks/knowledge/multi-provider-integration-patterns.md

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


68-68: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


96-96: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


97-97: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)

🪛 Ruff (0.14.7)
backend/alembic/versions/80de24a1ee9e_add_audio_books_library_models.py

8-8: Import from collections.abc instead: Sequence

Import from collections.abc

(UP035)


16-16: Use X | Y for type annotations

Convert to X | Y

(UP007)


17-17: Use X | Y for type annotations

Convert to X | Y

(UP007)


18-18: Use X | Y for type annotations

Convert to X | Y

(UP007)

🔇 Additional comments (55)
backend/pyproject.toml (5)

9-9: Verify readme file path.

The readme is referenced as "Readme.md" (line 9). Ensure this file exists at the repository root and uses the correct filename (case-sensitive on case-sensitive filesystems).


95-96: LGTM — packaging structure is clear.

The src-layout approach (packages = ["src/bookbytes"]) is well-established and prevents import issues during development.


103-120: LGTM — comprehensive and pragmatic lint rules.

The selected lint rules (E, W, F, I, C4, B, UP, ARG, SIM) align well with modern Python practices. The ignore list is sensible, particularly B008 for FastAPI Depends() patterns and ARG001 for unused function arguments (common in dependency injection).


125-139: LGTM — pytest configuration is well-suited for async testing.

The asyncio_mode = "auto" setting, test path definitions, and custom markers (integration, external, slow) provide good test organization and flexibility for the async-heavy codebase.


141-155: LGTM — strict mypy config with appropriate overrides.

The strict mode enforcement (disallow_untyped_defs, warn_return_any, etc.) establishes a strong type-safety baseline. The overrides for untyped third-party packages (gtts, arq, aioboto3) are pragmatic and necessary.

Once the Python version mismatch is fixed, verify that mypy strict mode passes on the codebase to ensure all modules are properly typed.

backend/tests/unit/test_cache_service.py (5)

42-81: LGTM!

The cache key generation tests comprehensively cover normalization, determinism, and different key types. The assertions correctly validate the expected key format and hash length.


89-127: LGTM!

The TTL handling tests appropriately validate jitter bounds and key-type-based TTL resolution. Using slightly wider bounds (0.89–1.11) for the jitter test is a sensible choice to avoid flaky tests from floating-point edge cases.


135-190: LGTM!

The cache get tests thoroughly cover the key scenarios: cache misses, hits with fresh data, stale-while-revalidate triggering, and graceful error handling. The mock setup correctly simulates the Redis responses.


198-235: LGTM!

The cache set tests validate JSON serialization, TTL jitter application, and error resilience. The assertions on call_args correctly verify the data being passed to Redis.


243-282: LGTM!

The cache invalidation tests properly cover single-key deletion, pattern-based invalidation with an async generator mock, and error handling. The pattern test correctly simulates the scan-and-delete workflow.

backend/tests/unit/test_openlibrary_service.py (6)

24-80: LGTM!

The mock response data accurately reflects the OpenLibrary API structure, and the fixtures provide clean, reusable test setup. The default cache miss behavior is appropriate for testing API call paths.


88-165: LGTM!

The DTO serialization tests thoroughly verify field preservation through to_dict and from_dict operations, ensuring cached data can be reliably reconstructed.


198-286: LGTM!

The search tests comprehensively cover cache hit/miss scenarios, API interaction, and parameter passing. The use of patch.object for _get_client effectively isolates HTTP behavior.


293-342: LGTM!

The work details tests properly validate the cache-first pattern and API fallback behavior, with appropriate assertions on caching side effects.


350-414: LGTM!

The error handling tests thoroughly verify the exception mapping: 429 → OpenLibraryRateLimitError, 5xx → OpenLibraryError, and network failures → OpenLibraryError. This ensures callers can handle different failure modes appropriately.


422-496: LGTM!

The helper method and response parsing tests validate delegation patterns, User-Agent formatting, and edge cases in API response handling (cover URL construction, dict-format descriptions).

backend/src/bookbytes/repositories/edition.py (5)

1-15: LGTM!

The repository correctly extends BaseRepository[Edition]. Since Edition appears to use hard deletes (no soft-delete mixin mentioned), BaseRepository is the appropriate base class.


17-27: LGTM!

The get_by_isbn method is correctly implemented for querying by the unique, indexed ISBN field.


29-53: LGTM!

The get_by_work_id method correctly implements paginated retrieval with sensible ordering (newest editions first, nulls last).


55-76: LGTM!

The get_latest_by_work method correctly retrieves the most recent edition for a work in the specified language. The nulls_last() ordering ensures editions with known publish years are preferred.


78-90: LGTM!

The isbn_exists method efficiently checks for ISBN existence by selecting only the ID column with a limit of 1, avoiding unnecessary data transfer.

backend/src/bookbytes/repositories/chapter.py (5)

17-41: LGTM!

The get_by_audio_book method correctly retrieves chapters ordered by chapter number with pagination support.


43-62: LGTM!

The get_by_number method correctly retrieves a specific chapter using the unique constraint on (audio_book_id, chapter_number).


64-80: LGTM!

The count query logic is correct. The inline import issue was addressed in the earlier comment.


82-98: LGTM!

The get_total_duration method correctly aggregates chapter durations with coalesce to handle the empty-set case gracefully.


100-115: LGTM!

The bulk delete implementation is appropriate for the regeneration use case. Using result.rowcount to return the deletion count is correct.

backend/src/bookbytes/repositories/audio_book.py (6)

1-16: LGTM!

The repository appropriately extends SoftDeleteRepository to preserve audiobook history, with clear documentation of the design rationale.


18-39: LGTM!

The get_by_edition method correctly implements soft-delete-aware querying with an optional flag to include deleted records.


41-66: LGTM!

The get_by_status method correctly filters by status with pagination and soft-delete awareness. Using status.value ensures proper comparison with the stored string value.


68-94: LGTM!

The get_pending and get_processing convenience methods cleanly delegate to get_by_status with appropriate defaults for processing workflows.


96-137: LGTM!

The status mutation methods (mark_processing, mark_completed, mark_failed) correctly update the audiobook state with appropriate side effects (clearing errors on success, recording errors on failure).


139-151: LGTM!

The increment_version method correctly implements the regeneration workflow: incrementing the version, resetting status to pending, and clearing any previous errors.

backend/src/bookbytes/repositories/book_provider.py (5)

1-16: LGTM!

Clean imports and class definition following the established repository pattern.


18-37: LGTM!

Query leverages the unique constraint on (provider, external_key) correctly.


39-69: LGTM!

The double filter on entity_type and work_id/edition_id is defensive and clarifies intent. The existing ix_book_providers_entity_lookup index on (entity_type, entity_id) should cover lookups via entity_id, though these queries filter by work_id/edition_id directly which is fine given the FK columns.


100-127: LGTM!

Consistent with create_work_mapping pattern. Same IntegrityError handling consideration applies.


129-149: LGTM!

Efficient existence check selecting only the id column with limit(1).

backend/tests/integration/test_openlibrary_api.py (4)

1-23: LGTM!

Clear documentation on running and skipping integration tests.


162-204: LGTM!

Test logic for work details retrieval is sound. Same fixture cleanup note applies.


211-228: LGTM!

Simple and effective test for ISBN presence in search results.


235-250: LGTM!

Good test for error handling with invalid work keys.

backend/src/bookbytes/services/cache.py (6)

1-31: LGTM!

Excellent documentation explaining the caching strategy, key types, and TTLs.


33-67: LGTM!

Clear TTL constants and sensible revalidation threshold.


69-96: LGTM!

Correct stale-while-revalidate implementation with fail-open error handling.


98-120: LGTM!

Auto-detected TTL with jitter is a good pattern for preventing cache stampede.


156-184: LGTM!

Good jitter implementation with sensible defaults for unknown key prefixes.


256-289: LGTM!

Standard FastAPI dependency injection pattern with clear initialization requirements.

backend/src/bookbytes/services/openlibrary.py (8)

1-19: LGTM!

Clean imports and clear module documentation.


27-77: LGTM!

Well-designed DTO with proper serialization methods and sensible defaults.


79-129: LGTM!

Consistent DTO design with BookSearchResult.


131-161: LGTM!

Clean pagination container and appropriate exception hierarchy.


163-207: LGTM!

Proper lazy HTTP client management with cleanup.


261-288: LGTM!

Correct cache-first pattern for work details.


306-352: LGTM!

Comprehensive error handling with specific rate limit detection and appropriate logging.


465-480: LGTM!

Consistent FastAPI dependency injection pattern.

Comment on lines +6 to +11
from uuid import UUID

from sqlalchemy import select

from bookbytes.models.chapter import Chapter
from bookbytes.repositories.base import BaseRepository
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Move inline imports to the top of the file.

The func and delete imports from SQLAlchemy are used in methods below but imported inline. This deviates from Python conventions and makes dependencies less visible.

 from uuid import UUID

-from sqlalchemy import select
+from sqlalchemy import delete, func, select

 from bookbytes.models.chapter import Chapter
 from bookbytes.repositories.base import BaseRepository

Then remove the inline imports at lines 73, 91, and 109.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from uuid import UUID
from sqlalchemy import select
from bookbytes.models.chapter import Chapter
from bookbytes.repositories.base import BaseRepository
from uuid import UUID
from sqlalchemy import delete, func, select
from bookbytes.models.chapter import Chapter
from bookbytes.repositories.base import BaseRepository
🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/chapter.py around lines 6 to 11, the file
currently only imports select from SQLAlchemy while func and delete are imported
inline later; move those inline imports to the top of the file by adding them to
the existing SQLAlchemy import statement (i.e., import func and delete alongside
select) and then remove the inline import statements at lines 73, 91, and 109 so
all SQLAlchemy dependencies are declared at the module level.

Comment on lines +38 to +50
async def get_with_editions(self, work_id: Work) -> Work | None:
"""Get a work with its editions eagerly loaded.

Args:
work_id: Work UUID

Returns:
Work with editions loaded, or None
"""
result = await self.session.execute(
select(Work).options(selectinload(Work.editions)).where(Work.id == work_id)
)
return result.scalar_one_or_none()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix work_id type annotation in get_with_editions.

The parameter is annotated as work_id: Work but treated as an ID (Work.id == work_id) and documented as “Work UUID”. This can mislead callers into passing a Work instance and cause runtime errors at the DB driver.

Change the signature to something like:

-    async def get_with_editions(self, work_id: Work) -> Work | None:
+    async def get_with_editions(self, work_id: UUID) -> Work | None:

(and import UUID if not already present).

🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/work.py around lines 38 to 50, the
parameter is incorrectly typed as work_id: Work while the code and docstring
treat it as a Work UUID; change the signature to accept a UUID (e.g., work_id:
UUID), update the docstring to reflect a UUID parameter if needed, and
add/import UUID from the uuid module at the top of the file (or use
typing.Union[UUID, str] if string IDs are accepted) so callers and the DB
comparison use the correct type.

Comment on lines +233 to +259
# Check cache first
cache_key = CacheService.search_key(
title=title,
author=author,
publisher=publisher,
language=language,
)

cached_data, needs_revalidation = await self.cache.get(cache_key)
if cached_data and not needs_revalidation:
logger.debug("search_cache_hit", cache_key=cache_key)
return self._parse_cached_search(cached_data)

# Cache miss or stale - fetch from API
logger.debug("search_cache_miss", cache_key=cache_key)
response = await self._fetch_search(
title=title,
author=author,
publisher=publisher,
language=language,
offset=offset,
)

# Cache the response (fire and forget for now)
await self.cache.set(cache_key, self._serialize_search(response))

return response
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cache key doesn't include offset parameter, breaking pagination.

The cache key is generated without considering offset, so requesting page 2 (offset=20) will return the cached page 1 results. Either:

  1. Include offset in the cache key, or
  2. Only cache the first page (offset=0)
         cache_key = CacheService.search_key(
             title=title,
             author=author,
             publisher=publisher,
             language=language,
         )

-        cached_data, needs_revalidation = await self.cache.get(cache_key)
-        if cached_data and not needs_revalidation:
+        # Only use cache for first page
+        if offset == 0:
+            cached_data, needs_revalidation = await self.cache.get(cache_key)
+            if cached_data and not needs_revalidation:
+                logger.debug("search_cache_hit", cache_key=cache_key)
+                return self._parse_cached_search(cached_data)
+        else:
+            cached_data = None

Or modify CacheService.search_key() to accept an optional offset parameter.

🤖 Prompt for AI Agents
In backend/src/bookbytes/services/openlibrary.py around lines 233-259, the cache
key is generated without the offset causing paginated requests to collide; fix
by including the offset when building the key (call CacheService.search_key(...,
offset=offset)) and, if CacheService.search_key does not currently accept
offset, update its signature/implementation to include offset in the key
generation (or alternatively only cache when offset == 0); ensure cache.get/set
use the same key and the rest of the serialization logic remains unchanged.

Comment on lines +61 to +67
@pytest.fixture
async def skip_if_no_network():
"""Skip test if OpenLibrary API is unreachable."""
is_reachable = await check_openlibrary_reachable()
if not is_reachable:
pytest.skip("OpenLibrary API is unreachable (network issue)")
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

skip_if_no_network fixture is defined but never used.

The tests don't reference this fixture, so they'll fail rather than skip when OpenLibrary is unreachable. Either:

  1. Add skip_if_no_network as a parameter to each test method, or
  2. Make the fixture autouse=True with appropriate scope
 @pytest.fixture
-async def skip_if_no_network():
+async def skip_if_no_network(request):
     """Skip test if OpenLibrary API is unreachable."""
     is_reachable = await check_openlibrary_reachable()
     if not is_reachable:
         pytest.skip("OpenLibrary API is unreachable (network issue)")
     return True

Then add it to test methods:

async def test_search_returns_results(
    self, skip_if_no_network, openlibrary_service: OpenLibraryService
) -> None:
🤖 Prompt for AI Agents
In backend/tests/integration/test_openlibrary_api.py around lines 61 to 67, the
skip_if_no_network fixture is defined but never used so tests won't skip when
OpenLibrary is unreachable; fix by either (A) adding skip_if_no_network as a
parameter to every async test that relies on the network (e.g.
test_search_returns_results(self, skip_if_no_network, openlibrary_service:
OpenLibraryService) -> None) or (B) change the fixture to autouse=True (and
appropriate scope) so it runs automatically for the module, e.g. mark the
fixture as @pytest.fixture(autouse=True, scope="module") to ensure tests are
skipped when the API is unreachable.

Comment on lines +75 to +96
@pytest.mark.integration
@pytest.mark.external
class TestOpenLibrarySearchIntegration:
"""Integration tests for search_books against real API."""

@pytest.mark.asyncio
async def test_search_returns_results(
self, openlibrary_service: OpenLibraryService
) -> None:
"""Test that search returns real results."""
result = await openlibrary_service.search_books(title="Lord of the Rings")

assert result.total_found > 0
assert len(result.results) > 0

# Verify first result has expected fields
first = result.results[0]
assert first.title # Has a title
assert first.external_work_key # Has work key
assert first.source_provider == "openlibrary"

await openlibrary_service.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Resource cleanup should use fixture teardown, not inline close() calls.

Calling close() at the end of each test is fragile—if the test fails before reaching close(), the HTTP client leaks. Use fixture-based cleanup instead:

 @pytest.fixture
-def openlibrary_service(mock_cache) -> OpenLibraryService:
+async def openlibrary_service(mock_cache) -> OpenLibraryService:
     """Create OpenLibraryService with mock cache for real API testing."""
-    return OpenLibraryService(mock_cache)
+    service = OpenLibraryService(mock_cache)
+    yield service
+    await service.close()

Then remove all await openlibrary_service.close() calls from test methods.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/tests/integration/test_openlibrary_api.py around lines 75 to 96, the
test is calling await openlibrary_service.close() inline which can leak the HTTP
client if the test fails; remove the inline close() calls from tests and instead
implement fixture-based teardown: change the openlibrary_service fixture to use
a yield pattern (create client, yield it, then close it in the fixture’s
finally/teardown block) so the client is always closed regardless of test
outcome, and update other integration tests to no longer call close().

Comment on lines +31 to +97
```
OpenLibrary API ──▶ ┌──────────────────┐ ◀── Google Books
│ Canonical Model │
│ (BookSearchResult│
│ Edition, Work) │
└────────┬─────────┘
┌──────────────────┐
│ Redis Cache │
│ (Canonical JSON) │
└──────────────────┘
```

### Anti-Corruption Layer (ACL)

From Domain-Driven Design (DDD) - translation layer protecting domain from external system quirks.

```python
class OpenLibraryAdapter:
"""Transforms OpenLibrary responses to canonical models."""

def to_search_result(self, raw: dict) -> BookSearchResult:
return BookSearchResult(
title=raw["title"],
authors=self._extract_authors(raw),
source_provider="openlibrary",
...
)
```

---

## Cache Key Strategy

### Current (Phase 1 - Single Provider)

Provider-agnostic keys:
| Key | Description |
|-----|-------------|
| `search:{hash}` | Search results |
| `isbn:{isbn}` | Book by ISBN |
| `work:{identifier}` | Work details |

### Future (Multi-Provider Phase)

Same keys, but cached data includes metadata:

```python
{
"data": { ... }, # Canonical model
"metadata": {
"source_provider": "openlibrary",
"fetched_at": "2024-12-07T...",
}
}
```

---

## Trade-offs Analysis

| Trade-off | Decision | Rationale |
| ------------------- | ----------------------------------------- | ----------------------------------------- |
| First provider wins | ✅ Accepted | Book metadata is similar across providers |
| Transform overhead | ✅ Accepted | Happens once at fetch time |
| Provider debugging | Include `source_provider` in cached value |
| Schema evolution | Version canonical models carefully |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten Markdown formatting (code block + trade-offs table).

A few small tweaks will keep markdownlint and renderers happy:

  1. Add a language to the ASCII diagram code fence:
-```
+```text
 OpenLibrary API ──▶ ┌──────────────────┐ ◀── Google Books
 ...

2. Ensure the trade-offs table has consistent columns (3 columns for all rows) and blank lines around it. For example, add rationales for the last two rows:

```diff
-| Provider debugging  | Include `source_provider` in cached value |
-| Schema evolution    | Version canonical models carefully        |
+| Provider debugging  | Include `source_provider` in cached value | Easier troubleshooting per cache entry   |
+| Schema evolution    | Version canonical models carefully        | Avoid breaking consumers during changes  |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


68-68: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


96-96: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


97-97: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)

🤖 Prompt for AI Agents
In tasks/knowledge/multi-provider-integration-patterns.md around lines 31 to 97,
tighten markdown by adding a language identifier to the ASCII diagram code fence
(use ```text) and ensure the trade-offs table has exactly three columns for
every row and is separated from surrounding text by blank lines; update the last
two rows to include missing Rationale cells (e.g., "Included in cached metadata"
and "Manage via careful versioning") and add an empty line before and after the
table so markdownlint and renderers parse it correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
backend/src/bookbytes/main.py (2)

49-51: Settings injection issue persists.

The lifespan function still calls get_settings() directly, ignoring any custom settings passed to create_app(settings). This breaks test scenarios where custom settings are injected.

See the fix suggested in the previous review: store settings in app.state._init_settings before assigning lifespan, then read from app.state in lifespan.


147-153: CORS configuration issue persists.

When is_development is False, allow_origins=[] blocks all cross-origin requests while allow_credentials=True remains set—this combination is ineffective.

Configure explicit production origins via settings, as suggested in the previous review.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4824f61 and 22440b8.

📒 Files selected for processing (6)
  • backend/src/bookbytes/api/v1/router.py (1 hunks)
  • backend/src/bookbytes/api/v1/search.py (1 hunks)
  • backend/src/bookbytes/main.py (1 hunks)
  • backend/src/bookbytes/schemas/search.py (1 hunks)
  • backend/tests/unit/test_search_endpoints.py (1 hunks)
  • tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
🧰 Additional context used
🧬 Code graph analysis (1)
backend/tests/unit/test_search_endpoints.py (3)
backend/src/bookbytes/api/v1/search.py (3)
  • get_openlibrary_service (36-40)
  • search_books (60-134)
  • get_work_details (149-189)
backend/src/bookbytes/services/cache.py (2)
  • get_cache_service (277-289)
  • get (69-96)
backend/src/bookbytes/services/openlibrary.py (4)
  • BookSearchResult (28-76)
  • SearchResponse (132-143)
  • WorkDetails (80-128)
  • close (203-207)
🔇 Additional comments (13)
backend/src/bookbytes/api/v1/router.py (1)

1-17: LGTM!

Clean router aggregation following standard FastAPI patterns. The structure allows for easy extension with future sub-routers as indicated by the placeholder comments.

backend/tests/unit/test_search_endpoints.py (4)

21-83: Well-structured test fixtures.

Good use of dependency overrides for isolating tests from real services. The cleanup in test_app ensures no state leakage between tests.


91-236: Comprehensive search endpoint test coverage.

Good coverage of success paths, pagination, filters, validation errors, and empty results. The assertions verify both response structure and service invocation parameters.


244-267: Test correctly validates work details retrieval.

The test properly validates the response structure. Note that the URL works/works/OL27448W is due to the :path parameter capturing works/OL27448W, which the endpoint normalizes to /works/OL27448W.


275-327: Good ISBN lookup test coverage.

Tests validate both success and not-found scenarios. Consider adding a test for external service errors (502 response) in future iterations.

backend/src/bookbytes/main.py (4)

156-210: Well-implemented request logging middleware.

Good use of correlation IDs for distributed tracing, proper timing measurements, and cleanup in the finally block. The middleware correctly propagates X-Request-ID to responses.


213-274: Solid exception handling implementation.

Appropriate log levels based on status codes, structured error responses with request IDs, and no sensitive data leakage. The exception() call for unhandled errors ensures stack traces are captured.


277-341: Health endpoints follow Kubernetes best practices.

Good separation of liveness (basic "am I running") and readiness (can I serve traffic) probes. The placeholder for Redis health check is appropriately marked with a TODO.


344-363: LGTM!

Standard CLI setup with uvicorn. The module-level app instance is correctly created for the "bookbytes.main:app" import reference.

backend/src/bookbytes/api/v1/search.py (3)

36-40: LGTM!

Clean dependency factory pattern for injecting the cache service into OpenLibraryService.


137-189: LGTM!

Clean implementation with proper path parameter handling using :path converter, work key normalization, structured error handling, and resource cleanup in the finally block.


224-226: The suggested fix is incorrect; the current implementation is the intended design.

The search_books() method doesn't have an isbn parameter—only title, author, publisher, language, and offset. Searching by ISBN via the title parameter is the correct approach given the OpenLibrary API's flexible search capabilities. The suggested change to isbn=clean_isbn would fail at runtime.

Likely an incorrect or invalid review comment.

backend/src/bookbytes/schemas/search.py (1)

228-290: LGTM!

Well-defined schemas for job responses, refresh requests, and audiobook details with appropriate field constraints and documentation.

Comment on lines +79 to +96
try:
# Calculate offset for API call
# We fetch 100 results max, then slice for pagination
api_offset = ((page - 1) // 5) * 100 # Fetch next batch every 5 pages

result = await openlibrary.search_books(
title=request.title,
author=request.author,
publisher=request.publisher,
language=request.language,
offset=api_offset,
)

# Calculate slice for requested page
# Within our 100-result batch, find the right slice
batch_start = (page - 1) % 5 * page_size
batch_end = batch_start + page_size
page_results = result.results[batch_start:batch_end]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pagination logic is broken for non-default page sizes.

The batch calculation assumes page_size=20 (5 pages × 20 = 100 per batch). When page_size differs, the slicing can be incorrect:

  • page=3, page_size=50: batch_start = (3-1) % 5 * 50 = 100, which exceeds the 100-result batch, returning empty results.
  • page=2, page_size=100: batch_start = (2-1) % 5 * 100 = 100, same issue.

Consider calculating based on actual item offsets:

-        # Calculate offset for API call
-        # We fetch 100 results max, then slice for pagination
-        api_offset = ((page - 1) // 5) * 100  # Fetch next batch every 5 pages
+        # Calculate offset for API call based on actual item positions
+        start_item = (page - 1) * page_size
+        api_offset = (start_item // 100) * 100  # Fetch batch containing start_item

         result = await openlibrary.search_books(
             title=request.title,
             author=request.author,
             publisher=request.publisher,
             language=request.language,
             offset=api_offset,
         )

-        # Calculate slice for requested page
-        # Within our 100-result batch, find the right slice
-        batch_start = (page - 1) % 5 * page_size
-        batch_end = batch_start + page_size
+        # Calculate slice within the batch
+        batch_start = start_item - api_offset
+        batch_end = batch_start + page_size
         page_results = result.results[batch_start:batch_end]
🤖 Prompt for AI Agents
In backend/src/bookbytes/api/v1/search.py around lines 79 to 96, the pagination
assumes a fixed page_size of 20 which breaks slicing when page_size differs;
replace the current batch math with calculations based on item offsets: compute
global_offset = (page - 1) * page_size, set api_offset = (global_offset // 100)
* 100 to fetch the correct 100-item batch, then compute batch_start =
global_offset % 100 and batch_end = batch_start + page_size and slice
result.results[batch_start:batch_end] (also guard the slice against result
length to avoid index errors).

Comment on lines +115 to +117
docs_url="/docs" if settings.is_development else "/docs",
redoc_url="/redoc" if settings.is_development else "/redoc",
openapi_url="/openapi.json" if settings.is_development else "/openapi.json",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No-op conditionals for API documentation URLs.

These conditionals always evaluate to the same value regardless of environment:

docs_url="/docs" if settings.is_development else "/docs"

If the intent was to disable docs in production, use None:

-        docs_url="/docs" if settings.is_development else "/docs",
-        redoc_url="/redoc" if settings.is_development else "/redoc",
-        openapi_url="/openapi.json" if settings.is_development else "/openapi.json",
+        docs_url="/docs" if settings.is_development else None,
+        redoc_url="/redoc" if settings.is_development else None,
+        openapi_url="/openapi.json" if settings.is_development else None,
🤖 Prompt for AI Agents
In backend/src/bookbytes/main.py around lines 115 to 117, the
docs_url/redoc_url/openapi_url conditionals are no-ops (they return the same
string regardless of environment); change each conditional to return the string
when settings.is_development is true and None when false (e.g., docs_url="/docs"
if settings.is_development else None) so documentation endpoints are disabled in
production; update all three entries (docs_url, redoc_url, openapi_url)
accordingly and run the app to verify docs are reachable in development and
hidden in production.

Comment on lines +62 to +88
model_config = ConfigDict(from_attributes=True)

title: str = Field(..., description="Book title")
authors: list[str] = Field(default_factory=list, description="Author names")
first_publish_year: int | None = Field(None, description="First publication year")
cover_url: str | None = Field(None, description="Cover image URL")
isbn_list: list[str] = Field(
default_factory=list, description="Available ISBNs (limited)"
)
edition_count: int = Field(0, description="Number of editions")
subjects: list[str] = Field(default_factory=list, description="Subject categories")
external_work_key: str = Field(..., description="OpenLibrary work key")

model_config = ConfigDict(
json_schema_extra={
"example": {
"title": "The Lord of the Rings",
"authors": ["J. R. R. Tolkien"],
"first_publish_year": 1954,
"cover_url": "https://covers.openlibrary.org/b/id/258027-M.jpg",
"isbn_list": ["9780618640157", "0618640150"],
"edition_count": 120,
"subjects": ["Fantasy", "Epic"],
"external_work_key": "/works/OL27448W",
}
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duplicate model_config overwrites from_attributes=True.

The second model_config assignment at line 75 completely overrides the first one at line 62, causing from_attributes=True to be lost.

Combine both into a single ConfigDict:

-    model_config = ConfigDict(from_attributes=True)
-
     title: str = Field(..., description="Book title")
     authors: list[str] = Field(default_factory=list, description="Author names")
     first_publish_year: int | None = Field(None, description="First publication year")
     cover_url: str | None = Field(None, description="Cover image URL")
     isbn_list: list[str] = Field(
         default_factory=list, description="Available ISBNs (limited)"
     )
     edition_count: int = Field(0, description="Number of editions")
     subjects: list[str] = Field(default_factory=list, description="Subject categories")
     external_work_key: str = Field(..., description="OpenLibrary work key")

     model_config = ConfigDict(
+        from_attributes=True,
         json_schema_extra={
             "example": {
                 ...
             }
         }
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model_config = ConfigDict(from_attributes=True)
title: str = Field(..., description="Book title")
authors: list[str] = Field(default_factory=list, description="Author names")
first_publish_year: int | None = Field(None, description="First publication year")
cover_url: str | None = Field(None, description="Cover image URL")
isbn_list: list[str] = Field(
default_factory=list, description="Available ISBNs (limited)"
)
edition_count: int = Field(0, description="Number of editions")
subjects: list[str] = Field(default_factory=list, description="Subject categories")
external_work_key: str = Field(..., description="OpenLibrary work key")
model_config = ConfigDict(
json_schema_extra={
"example": {
"title": "The Lord of the Rings",
"authors": ["J. R. R. Tolkien"],
"first_publish_year": 1954,
"cover_url": "https://covers.openlibrary.org/b/id/258027-M.jpg",
"isbn_list": ["9780618640157", "0618640150"],
"edition_count": 120,
"subjects": ["Fantasy", "Epic"],
"external_work_key": "/works/OL27448W",
}
}
)
title: str = Field(..., description="Book title")
authors: list[str] = Field(default_factory=list, description="Author names")
first_publish_year: int | None = Field(None, description="First publication year")
cover_url: str | None = Field(None, description="Cover image URL")
isbn_list: list[str] = Field(
default_factory=list, description="Available ISBNs (limited)"
)
edition_count: int = Field(0, description="Number of editions")
subjects: list[str] = Field(default_factory=list, description="Subject categories")
external_work_key: str = Field(..., description="OpenLibrary work key")
model_config = ConfigDict(
from_attributes=True,
json_schema_extra={
"example": {
"title": "The Lord of the Rings",
"authors": ["J. R. R. Tolkien"],
"first_publish_year": 1954,
"cover_url": "https://covers.openlibrary.org/b/id/258027-M.jpg",
"isbn_list": ["9780618640157", "0618640150"],
"edition_count": 120,
"subjects": ["Fantasy", "Epic"],
"external_work_key": "/works/OL27448W",
}
}
)
🤖 Prompt for AI Agents
In backend/src/bookbytes/schemas/search.py around lines 62 to 88, there are two
model_config assignments which cause the second to overwrite the first (losing
from_attributes=True); merge them into a single ConfigDict that includes
from_attributes=True and the existing json_schema_extra example (remove the
duplicate assignment and keep one combined model_config).

Comment on lines +205 to +225
class ProcessBookRequest(BaseSchema):
"""Request to process a book into an audiobook.

Either edition_id OR isbn must be provided (not both).
"""

edition_id: UUID | None = Field(
None, description="Internal edition UUID (if already in library)"
)
isbn: str | None = Field(
None,
min_length=10,
max_length=17,
description="ISBN to process (10 or 13 digits)",
)

model_config = ConfigDict(
json_schema_extra={
"example": {"isbn": "9780618640157"},
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing validation for mutually exclusive fields.

The docstring states "Either edition_id OR isbn must be provided (not both)", but the schema allows both to be None or both to be set. Add a model_validator to enforce this:

+from pydantic import model_validator
+
 class ProcessBookRequest(BaseSchema):
     """Request to process a book into an audiobook.

     Either edition_id OR isbn must be provided (not both).
     """

     edition_id: UUID | None = Field(
         None, description="Internal edition UUID (if already in library)"
     )
     isbn: str | None = Field(
         None,
         min_length=10,
         max_length=17,
         description="ISBN to process (10 or 13 digits)",
     )

+    @model_validator(mode="after")
+    def check_edition_or_isbn(self) -> "ProcessBookRequest":
+        if self.edition_id is None and self.isbn is None:
+            raise ValueError("Either edition_id or isbn must be provided")
+        if self.edition_id is not None and self.isbn is not None:
+            raise ValueError("Provide either edition_id or isbn, not both")
+        return self

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/src/bookbytes/schemas/search.py around lines 205 to 225, the
ProcessBookRequest schema's docstring requires exactly one of edition_id or isbn
but there's no runtime enforcement; add a Pydantic model_validator inside the
class (use @model_validator(mode="after")) that checks the instance: if both
fields are None or both are non-None, raise a ValueError with a clear message
like "Either edition_id or isbn must be provided, but not both." Place the
validator method in the class body and keep the existing Field definitions and
model_config unchanged.

Add LibraryService class to manage book persistence with Work and Edition operations including:
- Finding works by provider
- Creating works from search results
- Storing editions with ISBN normalization
- Creating provider mappings

Include unit and integration tests for all service methods
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (3)

1-10: Add a scope/status section clarifying which tasks are in-scope for this PR.

The document marks many tasks as incomplete ([ ]): the entire Testing phase (3.G), Background Jobs (3.F), and partial API endpoints (3.E.5-3.E.6). For a PR titled "productionise," the scope should be explicit. Is this PR completing only the [x] tasks, or are incomplete sections deferred to follow-up PRs?

Add a "PR Scope" section near the overview:

## PR Scope

This PR focuses on:
- [x] Phase 3.0: UUIDv7 foundation
- [x] Phase 3.A: Data Models
- [x] Phase 3.A-R: Repositories
- [x] Phase 3.B: Cache Service
- [x] Phase 3.C: OpenLibrary Service
- [x] Phase 3.D: Library Service
- [x] Phase 3.E: API endpoints (partial - 3.E.1 to 3.E.4, 3.E.7-3.E.8)
- [ ] Phase 3.F: Background Jobs (deferred)
- [ ] Phase 3.G: Testing (deferred)

See follow-up PR #X for testing and background job tasks.

194-196: Specify the cache key generation format and validation strategy.

Task 3.B.6 mentions normalizing params and using SHA256 hash, but the exact format is undefined. Will keys be "search:{sha256_hash}" or "{service}:{action}:{sha256_hash}"? This affects cache pattern invalidation (line 187: invalidate_pattern()) and debugging.

Clarify the cache key format:

### Cache Key Format

Keys follow the pattern: `{namespace}:{action}:{param_hash}`

Example: `search:books:a3f2c1e8...` (SHA256 of normalized params)

Normalization steps:
1. Lowercase all strings
2. Trim whitespace
3. Sort dict keys for consistent ordering
4. Serialize to JSON
5. SHA256(json_string)

125-128: Clarify migration rollback strategy and version compatibility.

Tasks 3.A.8 and 3.A.9 reference Alembic migrations but don't specify:

  • Version compatibility (can migrations rollback safely?)
  • Data migration strategy for soft-delete enablement
  • Testing of migration reversibility

Add migration guidance:

- [x] 3.A.8 Generate migration: `alembic revision --autogenerate -m "add_audio_books_library_models"`
  
  - Verify migration is reversible: `alembic downgrade -1 && alembic upgrade +1`
  - Test on staging database before production deployment
  - Document any data cleanup required on rollback
backend/tests/unit/test_library_service.py (1)

295-322: Consider adding ISBN normalization test for isbn_exists.

The tests cover both boolean outcomes, but unlike find_by_isbn, there's no test verifying that isbn_exists normalizes the input ISBN. The service implementation does normalize, so a test like isbn_exists("978-0-618-64015-7") would ensure consistency.

backend/src/bookbytes/services/library.py (2)

142-186: Consider adding provider metadata for consistency.

get_or_create_work passes provider_metadata with fetched_at and edition_count to create_work_mapping, but get_or_create_work_from_details does not. Since WorkDetails includes edition_count, you could include it for consistency:

         await self.provider_repo.create_work_mapping(
             work_id=created_work.id,
             provider=self.PROVIDER_OPENLIBRARY,
             external_key=work_details.external_work_key,
+            provider_metadata={
+                "edition_count": work_details.edition_count,
+            },
         )

254-256: Consider validating ISBN length.

The current logic defaults to "isbn10" for any ISBN that isn't 13 characters, which could mask invalid ISBNs. Consider adding validation:

         # Determine ISBN type
-        isbn_type = "isbn13" if len(clean_isbn) == 13 else "isbn10"
+        if len(clean_isbn) == 13:
+            isbn_type = "isbn13"
+        elif len(clean_isbn) == 10:
+            isbn_type = "isbn10"
+        else:
+            raise ValueError(f"Invalid ISBN length: {len(clean_isbn)} (expected 10 or 13)")

This would catch malformed ISBNs earlier rather than silently storing them.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22440b8 and b2a136c.

📒 Files selected for processing (5)
  • backend/src/bookbytes/services/library.py (1 hunks)
  • backend/tests/integration/test_library_service.py (1 hunks)
  • backend/tests/integration/test_search_endpoints.py (1 hunks)
  • backend/tests/unit/test_library_service.py (1 hunks)
  • tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/tests/unit/test_library_service.py (4)
backend/src/bookbytes/models/edition.py (1)
  • Edition (23-80)
backend/src/bookbytes/models/work.py (1)
  • Work (24-61)
backend/src/bookbytes/services/library.py (5)
  • isbn_exists (287-297)
  • find_work_by_provider (57-82)
  • get_or_create_work (84-140)
  • find_by_isbn (191-201)
  • store_edition (219-285)
backend/src/bookbytes/services/openlibrary.py (1)
  • BookSearchResult (28-76)
backend/tests/integration/test_search_endpoints.py (3)
backend/src/bookbytes/services/cache.py (2)
  • get_cache_service (277-289)
  • get (69-96)
backend/src/bookbytes/services/openlibrary.py (4)
  • BookSearchResult (28-76)
  • SearchResponse (132-143)
  • WorkDetails (80-128)
  • close (203-207)
backend/tests/conftest.py (2)
  • app (56-58)
  • async_client (62-91)
🔇 Additional comments (23)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1)

3-4: Provide the referenced design document in the PR or clarify its availability.

Line 3 references ./design-doc.md as a design reference, but this file is not included in the PR review. If it contains critical architectural decisions (data model relationships, caching strategy, API contract), reviewers need access to validate the implementation tasks.

Confirm whether design-doc.md exists in the PR. If it does, provide it for review. If it's external or missing, either add it or remove the reference.

backend/tests/unit/test_library_service.py (6)

1-14: LGTM!

Clean imports and well-organized module structure with clear docstring explaining the test purpose.


21-101: LGTM!

Well-structured fixtures with proper async mock setup. The sample data fixtures use consistent UUIDs that properly link the work and edition entities.


109-148: LGTM!

Good coverage of both the "not found" and "found" paths for find_work_by_provider. The mock setup correctly simulates the provider mapping lookup.


150-191: LGTM!

Solid test coverage for the get-or-create pattern, correctly verifying that existing works are returned without duplication and new works trigger both entity creation and provider mapping.


198-226: LGTM!

Good tests for ISBN normalization and successful lookup. The normalization test is particularly valuable for ensuring consistent ISBN handling.


228-293: LGTM!

Comprehensive test coverage for store_edition including idempotent behavior, new edition creation, and optional provider mapping creation.

backend/tests/integration/test_library_service.py (5)

1-9: LGTM!

Clear docstring that honestly acknowledges these are not true database integration tests. The run command is helpful for developers.


26-123: LGTM!

Good fixture organization. Using sample_work as a dependency for sample_edition improves maintainability. The addition of sample_work_details enables testing the get_or_create_work_from_details code path.


130-191: LGTM!

Good detailed assertions that verify the provider mapping kwargs. The test_create_work_from_details test provides coverage for the WorkDetails code path not tested in the unit tests.


198-285: LGTM!

The test_store_edition_normalizes_isbn test adds valuable coverage by verifying ISBN normalization during edition storage. Good complementary testing alongside the unit tests.


292-332: LGTM!

Focused tests for provider mapping scenarios with good assertion detail on the mapping kwargs.

backend/src/bookbytes/services/library.py (7)

1-52: LGTM!

Clean module structure with clear docstrings. The constructor properly accepts the three repository dependencies, enabling testability through dependency injection.


57-83: LGTM!

The null-safe check if mapping and mapping.work_id properly handles both missing mappings and edition-only mappings. Good use of debug logging.


84-141: LGTM!

Well-implemented get-or-create pattern with appropriate logging. The provider metadata capture (fetched_at, edition_count) is useful for debugging and analytics.


191-202: LGTM!

Clean ISBN normalization before lookup.


203-218: LGTM!

Simple delegation with sensible default language.


287-298: LGTM!

Consistent ISBN normalization pattern with other methods.


300-319: LGTM with a note on testing.

This global singleton pattern is common in FastAPI applications. The clear RuntimeError message helps catch initialization issues early. For testing, ensure you call set_library_service(None) or use a fixture to reset state between tests to avoid cross-test pollution.

backend/tests/integration/test_search_endpoints.py (4)

25-72: LGTM! Fixtures are well-structured.

The mock fixtures correctly use AsyncMock for async operations, and the sample data fixtures provide realistic test data that's reusable across multiple tests.


244-269: Good error response validation.

The test correctly validates both the status code and the specific error code (BOOK_NOT_FOUND), ensuring the API returns structured error responses.


305-332: LGTM! Request ID testing is comprehensive.

The tests properly verify both automatic request ID generation and preservation of custom request IDs, which is essential for distributed tracing.


195-195: Verify the URL path structure.

The URL /api/v1/books/works/works/OL27448W contains "works" twice. Given that external_work_key is /works/OL27448W (line 71), verify whether the endpoint expects:

  • The full key including "works/" prefix (current implementation), or
  • Just the ID portion "OL27448W"

Comment on lines +100 to +114
app.dependency_overrides[get_cache_service] = lambda: mock_cache
app.dependency_overrides[get_openlibrary_service] = lambda: mock_openlibrary

response = await async_client.post(
"/api/v1/books/search",
json={"title": "Lord of the Rings"},
)

assert response.status_code == 200
data = response.json()
assert data["total_found"] == 1
assert len(data["results"]) == 1
assert data["results"][0]["title"] == "The Lord of the Rings"

app.dependency_overrides.clear()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Centralize dependency override cleanup in a fixture.

Manual cleanup with app.dependency_overrides.clear() appears in every test method (lines 114, 136, 169, 203, 242, 269, 297). If a test fails or raises an exception before reaching the cleanup line, overrides will persist and affect subsequent tests, causing flaky test failures.

Create an autouse fixture to ensure cleanup:

@pytest.fixture(autouse=True)
def clear_overrides(app):
    """Automatically clear dependency overrides after each test."""
    yield
    app.dependency_overrides.clear()

Then remove all manual app.dependency_overrides.clear() calls from test methods.

🤖 Prompt for AI Agents
In backend/tests/integration/test_search_endpoints.py around lines 100 to 114,
tests manually call app.dependency_overrides.clear() at the end of each test
which can leak overrides if a test errors; add an autouse fixture in the test
module (or a conftest.py) that yields and clears app.dependency_overrides after
each test so cleanup always runs, then remove all manual
app.dependency_overrides.clear() calls from the individual tests.

Comment on lines +87 to +89
- [ ] 3.0.4 Verify UUIDv7 sorting works in PostgreSQL
- UUIDv7 is time-sortable when stored as native UUID type

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the task file to see the full context around lines 87-89
cat -n tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md | sed -n '80,150p'

Repository: cryptus-neoxys/bookbytes

Length of output: 2904


🏁 Script executed:

# Verify that UUIDv7 values sort correctly by time in PostgreSQL
# Check if uuid6 library is in pyproject.toml
rg -n "uuid6" --type toml

# Verify UUIDPrimaryKeyMixin uses uuid7
rg -A 5 "class UUIDPrimaryKeyMixin" --type python

# Check model files for uuid7 usage
fd -e py -x rg -l "uuid7" {} \;

Repository: cryptus-neoxys/bookbytes

Length of output: 141


🏁 Script executed:

# Find and check pyproject.toml
fd -name "pyproject.toml" -type f

Repository: cryptus-neoxys/bookbytes

Length of output: 174


🏁 Script executed:

# Check for uuid6 in all TOML files
fd -name "*.toml" -type f -exec cat {} \; | grep -i uuid6 -B 2 -A 2

Repository: cryptus-neoxys/bookbytes

Length of output: 174


🏁 Script executed:

# Search for UUIDPrimaryKeyMixin in the codebase
rg "UUIDPrimaryKeyMixin" -A 5 --type python

Repository: cryptus-neoxys/bookbytes

Length of output: 174


🏁 Script executed:

# Search for uuid7 usage in Python files
rg "uuid7" --type python -B 2 -A 2

Repository: cryptus-neoxys/bookbytes

Length of output: 174


Resolve task ordering: complete UUIDv7 sorting verification before marking dependent model tasks complete.

Task 3.0.4 (verify UUIDv7 sorting in PostgreSQL) is marked incomplete [ ], yet all dependent model creation tasks (3.A.1 through 3.A.9+) are marked complete [x]. This creates risk: if UUIDv7 sorting doesn't work as expected in PostgreSQL with the native UUID type, the entire model foundation may require rework.

Complete 3.0.4 verification and mark it complete before closing model creation tasks 3.A.1 onwards.

🤖 Prompt for AI Agents
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 87-89: Task 3.0.4 (Verify UUIDv7 sorting works in PostgreSQL) is
still unchecked while downstream model tasks 3.A.1+ are marked complete—ensure
3.0.4 is fully verified and passing before marking dependent tasks done. Run the
verification steps against the production PostgreSQL schema (store UUIDv7 as
native uuid, insert time-ordered samples, query ORDER BY the uuid column and
confirm chronological order), document the test results and any DB/config
changes, then update the checklist to mark 3.0.4 as complete only after
successful verification and revert any dependent model tasks to incomplete if
rework is required.

Comment on lines +207 to +209
- [x] 3.C.2 Add User-Agent header

- Include app name and contact for API compliance
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify OpenLibrary User-Agent header configuration and runtime validation.

Line 208-209 require including app name and contact in the User-Agent header, but there's no task or configuration reference. How is this value injected? Where is it stored? Should it be validated in tests?

Add configuration details:

- [x] 3.C.2 Add User-Agent header

  - Format: `bookbytes/0.1 (+https://github.com/cryptus-neoxys/bookbytes; [email protected])`
  - Stored in `settings.OPENLIBRARY_USER_AGENT` (Pydantic Settings)
  - Verified in unit tests with httpx mock

Comment on lines 307 to 319
- [ ] 3.F.1 Create/update Job model for audiobook processing

- Add audio_book_id FK

- [ ] 3.F.2 Update `workers/tasks.py` with audiobook processing

- Fetch work details
- Collect all ISBNs
- Generate audio (existing pipeline)

- [ ] 3.F.3 Implement cache invalidation on job completion
- Invalidate search caches containing this work

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Flag incomplete Background Jobs phase (3.F) as a blocker for production readiness.

All tasks in Phase 3.F (Background Jobs) are marked [ ] (incomplete). This is critical for a "productionise" PR because:

  1. Without task queue integration, audiobook processing is missing
  2. Cache invalidation on job completion (3.F.3) is unimplemented, risking stale data
  3. Job model updates (3.F.1) may conflict with existing Job schema

Either complete Phase 3.F in this PR or explicitly defer it to a follow-up PR with clear tracking.

🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 307 to 319, Phase 3.F (Background Jobs) is left incomplete and must
be treated as a blocker: update the Job model to add audio_book_id FK in a
backward-compatible way (migration + nullable FK or separate new model to avoid
schema conflicts), implement audiobook processing in workers/tasks.py to fetch
work details, collect ISBNs and invoke the existing audio generation pipeline
(with error handling and retries), and add cache invalidation on job completion
that removes/refreshes any search caches containing the work; if you cannot
complete all items in this PR, explicitly defer them by creating tracker
tickets, add clear TODOs in the file referencing ticket IDs, and mark the PR as
blocked until those tickets are resolved.

Comment on lines +358 to +365
- All models use `UUIDPrimaryKeyMixin` (now v7) and `TimestampMixin`
- AudioBook uses `SoftDeleteMixin` for soft deletion
- AudioBookRepository extends `SoftDeleteRepository`
- BookProvider is a sparse/polymorphic table (see design doc for query patterns)
- **Redis-only caching:** No PostgreSQL cache table, uses AOF persistence (`appendfsync everysec`)
- **Cache vs Library:** Search cache is transient (Redis), library data is permanent (Work/Edition in PostgreSQL)
- Redis memory policy: `allkeys-lru` with 256mb limit
- OpenLibrary requires User-Agent header to avoid rate limiting
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add missing infrastructure and deployment configuration tasks.

The notes mention Redis configuration (allkeys-lru with 256mb limit, AOF persistence with appendfsync everysec) but there are no corresponding tasks for Docker, Helm, or deployment infrastructure. This is critical for operational readiness.

Add a Phase 3.H section for infrastructure/deployment:

### Phase 3.H: Infrastructure & Deployment

- [ ] 3.H.1 Create Redis Helm values with allkeys-lru policy and AOF persistence
- [ ] 3.H.2 Update docker-compose.yml for local development (Redis + PostgreSQL)
- [ ] 3.H.3 Add Redis health checks to Kubernetes deployment
- [ ] 3.H.4 Document Redis backup/recovery strategy for AOF
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 358 to 365, the notes list Redis configuration but lack concrete
infrastructure/deployment tasks; add a new "Phase 3.H: Infrastructure &
Deployment" section with checklist items to create Redis Helm values
(allkeys-lru, AOF appendfsync everysec), update docker-compose.yml for local dev
(Redis + PostgreSQL), add Redis health checks to the Kubernetes manifests, and
document Redis backup/recovery strategy for AOF so operational/deployment steps
align with the design notes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/Readme.md (1)

103-103: Consider relaxing the Python version requirement.

The prerequisites mandate Python 3.13+, which is very recent (released October 2024) and may limit deployment environments and adoption. Most production systems still use Python 3.11 or 3.12 LTS. Document the minimum supported version more conservatively unless 3.13-specific features are critical.

Consider updating to "Python 3.11+" or "Python 3.12+" unless 3.13-specific features are essential. Document the rationale if a newer version is required.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2a136c and 8baa5ef.

📒 Files selected for processing (1)
  • backend/Readme.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
backend/Readme.md

13-13: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


36-36: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


75-75: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (1)
backend/Readme.md (1)

1-229: Overall: Well-structured and comprehensive documentation.

The README is thorough, covering architecture, setup, API endpoints, testing, and configuration clearly. The ASCII diagrams and mermaid flowchart are helpful. Bash examples are practical. Once the licensing and markdown linting issues are resolved, this documentation will serve the backend well.


## Architecture

```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add language identifiers to fenced code blocks.

Three fenced code blocks are missing language specifications, causing markdownlint violations at lines 13, 36, and 75. While these are ASCII diagrams, specifying a language improves consistency and tooling compatibility.

Apply this diff:

-```
+```text
 ┌──────────────────────────────────────────────────────────────┐
 │                    BookBytes Library                          │

And similarly for lines 36 and 75:

-```
+```text
 Work (canonical book)
   ├── Edition (specific ISBN/format)
-```
+```text
 backend/
 ├── src/bookbytes/

Also applies to: 36-36, 75-75

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

13-13: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In backend/Readme.md around lines 13, 36 and 75, three fenced code blocks use
triple backticks with no language identifier; update each opening fence from ```
to ```text so the ASCII diagrams are marked as plain text (replace the fences at
those exact lines and keep the diagram content unchanged).

| `LOG_LEVEL` | `INFO` | Logging level |
| `LOG_FORMAT` | `json` | `json` or `console` |

## License
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Establish a license before production release.

The README states "Not licensed yet", but this PR is for productionization and saasification. The project must have a clear license established before release to address intellectual property, liability, and usage rights.

Define and document a clear license (e.g., MIT, Apache 2.0, proprietary, AGPL, etc.) in the LICENSE file and update this section accordingly.

🤖 Prompt for AI Agents
In backend/Readme.md around line 229, the README currently says "Not licensed
yet" but the project is being productionized; add a LICENSE file with the chosen
license text (e.g., MIT, Apache-2.0, AGPL, or a proprietary license) and update
the README License section to reference and summarize that license. Commit the
LICENSE file at repo root containing the full license header, ensure package
metadata (package.json, setup.cfg, etc.) reflect the chosen license field, and
replace "Not licensed yet" with the license name and a short line like "See
LICENSE for full terms" plus any project-specific copyright.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
.agent/rules/pythonic-fastapi-async.md (1)

16-16: Fix spelling errors in guidance document (unfixed from prior review).

These typos were flagged in a previous review but remain uncorrected:

  • Line 16: "Shemas" → "Schemas" and "SQLAlachemy" → "SQLAlchemy"
  • Line 22: "inheritence" → "inheritance"

Apply these corrections:

- Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices.
+ Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
- Favour composition over inheritence. Smart core, Thin interfaces.
+ Favour composition over inheritance. Smart core, Thin interfaces.

Also applies to: 22-22

tasks/backend-productionisation/tasks-backend-productionisation.md (1)

227-227: Fix markdown formatting in completed task annotations (duplicate of earlier feedback).

Lines 227 and 234 still have incorrect emphasis marker syntax with spaces and stray escaped characters.

Apply this diff:

-  - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_
+  - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)*
-  - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)\_
+  - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)*

Also applies to: 234-234

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8baa5ef and 84977b1.

📒 Files selected for processing (2)
  • .agent/rules/pythonic-fastapi-async.md (1 hunks)
  • tasks/backend-productionisation/tasks-backend-productionisation.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
tasks/backend-productionisation/tasks-backend-productionisation.md

[grammar] ~1-~1: Ensure spelling is correct
Context: # Tasks: Backend Productionisation PRD Reference: [prd-backend-production...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~106-~106: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ker/docker-compose.yml up` to start the full stack - Environment variables are validated a...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

.agent/rules/pythonic-fastapi-async.md

[grammar] ~16-~16: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~22-~22: Ensure spelling is correct
Context: ... duplication. - Favour composition over inheritence. Smart core, Thin interfaces. ## Namin...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/tasks-backend-productionisation.md

227-227: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


234-234: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

🔇 Additional comments (1)
tasks/backend-productionisation/tasks-backend-productionisation.md (1)

106-106: Use hyphenation for compound adjective modifying noun.

Line 106 uses "full stack" as a compound adjective; it should be hyphenated as "full-stack" when modifying the following noun.

Apply this diff:

-Use `docker-compose -f docker/docker-compose.yml up` to start the full stack
+Use `docker-compose -f docker/docker-compose.yml up` to start the full-stack

Alternatively, restructure to avoid the compound adjective:

-Use `docker-compose -f docker/docker-compose.yml up` to start the full stack
+Use `docker-compose -f docker/docker-compose.yml up` to run the entire stack

Likely an incorrect or invalid review comment.

Comment on lines +204 to +222
- [ ] 6.0 **Phase 6: JWT Authentication**

- [ ] 6.1 Create `src/bookbytes/models/user.py` with `User` model: `id` (UUID, PK), `email` (unique), `hashed_password`, `is_active` (default True), timestamps
- [ ] 6.2 Create `src/bookbytes/repositories/user.py` with `UserRepository` extending base, adding `get_by_email()` method
- [ ] 6.3 Generate migration for User model: `alembic revision --autogenerate -m "add_user_model"`
- [ ] 6.4 Create `src/bookbytes/core/security.py` with password utilities: `hash_password(password: str) -> str` using passlib bcrypt, `verify_password(plain: str, hashed: str) -> bool`
- [ ] 6.2 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation
- [ ] 6.3 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access")
- [ ] 6.4 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")`
- [ ] 6.5 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user)
- [ ] 6.6 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid
- [ ] 6.7 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth
- [ ] 6.8 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user
- [ ] 6.9 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration
- [ ] 6.10 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py`
- [ ] 6.11 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id`
- [ ] 6.12 Include auth router in v1 router with `/auth` prefix
- [ ] 6.13 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token
- [ ] 6.14 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix task numbering duplication in Phase 6.

Tasks 6.2–6.14 have conflicting/duplicate numbering. After 6.3 (line 208), the next task reverts to 6.2 instead of continuing to 6.4. This cascades through the entire authentication section, confusing task tracking and progress.

Apply this diff to correct the numbering sequence:

   - [ ] 6.1 Create `src/bookbytes/models/user.py` with `User` model: `id` (UUID, PK), `email` (unique), `hashed_password`, `is_active` (default True), timestamps
   - [ ] 6.2 Create `src/bookbytes/repositories/user.py` with `UserRepository` extending base, adding `get_by_email()` method
   - [ ] 6.3 Generate migration for User model: `alembic revision --autogenerate -m "add_user_model"`
-  - [ ] 6.2 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation
-  - [ ] 6.3 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access")
-  - [ ] 6.4 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")`
-  - [ ] 6.5 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user)
-  - [ ] 6.6 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid
-  - [ ] 6.7 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth
-  - [ ] 6.8 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user
-  - [ ] 6.9 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration
-  - [ ] 6.10 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py`
-  - [ ] 6.11 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id`
-  - [ ] 6.12 Include auth router in v1 router with `/auth` prefix
-  - [ ] 6.13 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token
-  - [ ] 6.14 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)
+  - [ ] 6.4 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation
+  - [ ] 6.5 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access")
+  - [ ] 6.6 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")`
+  - [ ] 6.7 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user)
+  - [ ] 6.8 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid
+  - [ ] 6.9 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth
+  - [ ] 6.10 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user
+  - [ ] 6.11 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration
+  - [ ] 6.12 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py`
+  - [ ] 6.13 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id`
+  - [ ] 6.14 Include auth router in v1 router with `/auth` prefix
+  - [ ] 6.15 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token
+  - [ ] 6.16 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)
🤖 Prompt for AI Agents
In tasks/backend-productionisation/tasks-backend-productionisation.md around
lines 204 to 222, the checklist numbering in Phase 6 duplicates (items after 6.3
revert to 6.2 and cascade), which breaks task sequencing; renumber the items
sequentially so Phase 6 reads 6.0 header, then 6.1 through 6.14 in order
(increment each duplicated 6.x entry following 6.3 to 6.4, 6.5, etc.) and ensure
cross-references (like endpoint numbers) match the new sequence.

…plement protocol-based services

- Move background jobs and processing tasks from audio-books-library to new audio-books-pipeline module
- Implement protocol-based abstraction for LLM and TTS services
- Update design doc with protocol interfaces and provider implementations
- Mark old service tasks as superseded in main productionisation tasks
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
tasks/backend-productionisation/tasks-backend-productionisation.md (3)

250-250: Fix markdown emphasis formatting (duplicate of previous review feedback).

Spaces inside emphasis markers violate markdown lint standards (MD037). This was flagged in the previous review and remains unfixed.

Apply this diff:

-  - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_
+  - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)*

257-257: Fix markdown emphasis formatting (duplicate of previous review feedback).

Spaces inside emphasis markers violate markdown lint standards (MD037). This was flagged in the previous review and remains unfixed.

Apply this diff:

-  - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)\_
+  - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)*

232-245: Fix duplicate task numbering in Phase 6 (duplicate of previous review feedback).

Task numbering reverts to 6.2 after 6.3, cascading through the entire authentication section. This creates ambiguity when referencing tasks and breaks progress tracking. This was flagged in the previous review and remains unresolved.

Apply this diff to renumber Phase 6 sequentially:

   - [ ] 6.1 Create `src/bookbytes/models/user.py` with `User` model: `id` (UUID, PK), `email` (unique), `hashed_password`, `is_active` (default True), timestamps
   - [ ] 6.2 Create `src/bookbytes/repositories/user.py` with `UserRepository` extending base, adding `get_by_email()` method
   - [ ] 6.3 Generate migration for User model: `alembic revision --autogenerate -m "add_user_model"`
-  - [ ] 6.2 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation
-  - [ ] 6.3 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access")
-  - [ ] 6.4 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")`
-  - [ ] 6.5 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user)
-  - [ ] 6.6 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid
-  - [ ] 6.7 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth
-  - [ ] 6.8 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user
-  - [ ] 6.9 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration
-  - [ ] 6.10 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py`
-  - [ ] 6.11 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id`
-  - [ ] 6.12 Include auth router in v1 router with `/auth` prefix
-  - [ ] 6.13 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token
-  - [ ] 6.14 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)
+  - [ ] 6.4 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation
+  - [ ] 6.5 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access")
+  - [ ] 6.6 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")`
+  - [ ] 6.7 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user)
+  - [ ] 6.8 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid
+  - [ ] 6.9 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth
+  - [ ] 6.10 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user
+  - [ ] 6.11 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration
+  - [ ] 6.12 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py`
+  - [ ] 6.13 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id`
+  - [ ] 6.14 Include auth router in v1 router with `/auth` prefix
+  - [ ] 6.15 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token
+  - [ ] 6.16 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (4)

207-209: Specify OpenLibrary User-Agent configuration fully (duplicate of previous review feedback).

Lines 207–209 mention the User-Agent header requirement but lack configuration/validation specification. Where is this value stored? How is it validated at runtime? Previous review flagged missing implementation details.

Add explicit configuration tasks:

- [ ] 3.C.2a Store User-Agent in `settings.OPENLIBRARY_USER_AGENT` (Pydantic Settings)
  - Format: `bookbytes/0.1 (+https://github.com/cryptus-neoxys/bookbytes; [email protected])`
  - Document in `.env.example`

- [ ] 3.C.2b Validate User-Agent format in settings with regex or enum checks

- [ ] 3.C.2c Add unit test verifying User-Agent header is included in OpenLibrary requests

This ensures the configuration is discoverable and validated at startup.


305-315: Verify Phase 3.F (Background Jobs) is properly tracked in deferred module (duplicate of previous review feedback).

Phase 3.F is deferred to audio-books-pipeline/tasks-audio-books-pipeline.md (lines 305–315), which is necessary for audiobook processing. Background jobs are a critical dependency for the full productionization workflow. Verify that:

  1. The deferred module contains complete task coverage for job infrastructure, processing, and cache invalidation
  2. The link is bidirectional (audio-books-pipeline module references this file)
  3. PR merge is not blocked by incomplete job setup in the target module

Run this verification:

#!/bin/bash
# Verify audio-books-pipeline module exists and has background jobs tasks

echo "=== Checking audio-books-pipeline module ==="
if [ -f "tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md" ]; then
  echo "✓ Module found"
  echo ""
  echo "=== Checking for Phase 3.F equivalents (job tasks) ==="
  grep -n "Job\|Background\|Worker\|Process" "tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md" | head -20
else
  echo "✗ Module not found"
fi

If the deferred module is missing or incomplete, explicitly defer in a follow-up issue with timeline.


87-89: Resolve UUIDv7 verification before marking dependent model tasks complete (duplicate of previous review feedback).

Task 3.0.4 is still marked incomplete [ ] while all downstream model creation tasks (3.A.1–3.A.9, 3.A-R.1–3.A-R.6) are marked complete [x]. This creates risk: if UUIDv7 sorting verification fails, the entire model foundation requires rework. Verification must complete before committing to the v7 implementation.

Complete 3.0.4 verification (test UUIDv7 sorting in PostgreSQL with native UUID type) and mark it complete before closing dependent model tasks. Run the verification steps:

  1. Insert sample UUIDv7 values with known timestamps
  2. Query ORDER BY id and confirm chronological order
  3. Document results and any configuration changes required

Update the checklist to mark 3.0.4 complete only after successful verification. If verification fails, revert dependent tasks to incomplete and plan rework.


318-347: Entire Testing phase (3.G) is incomplete—blocks production readiness claim (duplicate of previous review feedback).

Phase 3.G (lines 318–347) is entirely unchecked [ ]. For a PR titled "productionise," comprehensive test coverage is non-negotiable. Without tests, the "production-ready" claim lacks evidence:

  • No cache service behavior verification (TTL jitter, stale-while-revalidate)
  • No OpenLibrary integration tests (mocked HTTP)
  • No library service workflow tests
  • No end-to-end API flow tests
  • No mock fixtures for external services

This was flagged as a critical blocker in the previous review and remains unresolved.

Either implement the listed tests in this PR or create a follow-up PR with explicit scope, timeline, and linked issue. Production code without test coverage creates compliance, reliability, and maintainability risk.

If deferring, add:

### Phase 3.G: Testing (Deferred to PR #XXX)

[Create linked issue/PR with:
- Tests for cache service (TTL, jitter, stale-while-revalidate)
- Tests for OpenLibrary client (mocked HTTP, error handling)
- Tests for library service workflows
- Integration tests for search API endpoints
- Mock fixtures under tests/mocks/openlibrary_responses.py
- Timeline: [specify target date]
]
🧹 Nitpick comments (5)
tasks/backend-productionisation/audio-books-pipeline/design-doc.md (4)

55-75: Add language specification to code fence for ASCII diagram.

The ASCII art diagram at line 55-74 is missing a language specification. While ASCII diagrams render without a language tag, adding one (e.g., ```text or ```plaintext) improves accessibility and satisfies linting requirements.

Apply this diff to add the language specification:

-```
+```text
 ┌─────────────────────────────────────────────────────────────┐
 │                    ProcessingService                         │

113-173: Add language specification to Python code fence.

The Python code block at lines 113-173 (LLMProvider protocol and related classes) is missing the language specification. Add ```python to enable syntax highlighting.

Apply this diff:

-```
+```python
 from typing import Protocol
 from pydantic import BaseModel

177-208: Add language specification to Python code fence.

The Python code block at lines 177-208 (TTSProvider protocol and TTSService) is missing the language specification. Add ```python.

Apply this diff:

-```
+```python
 from typing import Protocol, AsyncIterator

224-233: Add language specification to Python code fence.

The Python code block at lines 224-233 (TTSProvider abstract class example) is missing the language specification. Add ```python.

Apply this diff:

-```
+```python
 # Protocol-based interface for TTS
 class TTSProvider(Protocol):
tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md (1)

135-147: Add language specification to code fence for ASCII diagram.

The ASCII architecture diagram at lines 135-147 is missing a language specification. Add ```text to comply with markdown linting and improve accessibility.

Apply this diff:

-```
+```text
 ┌─────────────────────────────────────────────────────────────┐
 │                    ProcessingService                         │
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84977b1 and 090cce6.

📒 Files selected for processing (5)
  • tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1 hunks)
  • tasks/backend-productionisation/audio-books-pipeline/design-doc.md (1 hunks)
  • tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md (1 hunks)
  • tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1 hunks)
  • tasks/backend-productionisation/tasks-backend-productionisation.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md

[style] ~46-~46: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...us}` immediately | | US2 | As a user, I want to poll job status to know when my audiobo...

(REP_WANT_TO_VB)


[style] ~47-~47: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... | | US3 | As a user, I want to listen to generated chapter summaries ...

(REP_WANT_TO_VB)


[style] ~48-~48: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...RL | | US4 | As a user, I want to refresh an audiobook when I want update...

(REP_WANT_TO_VB)


[style] ~55-~55: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...e | | DS2 | As a developer, I want to switch TTS providers without changing b...

(REP_WANT_TO_VB)


[style] ~57-~57: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... | | DS4 | As a developer, I want to test the pipeline without calling real ...

(REP_WANT_TO_VB)

tasks/backend-productionisation/tasks-backend-productionisation.md

[grammar] ~1-~1: Ensure spelling is correct
Context: # Tasks: Backend Productionisation PRD Reference: [prd-backend-production...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~106-~106: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ker/docker-compose.yml up` to start the full stack - Environment variables are validated a...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md

135-135: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

tasks/backend-productionisation/audio-books-pipeline/design-doc.md

9-9: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🔇 Additional comments (3)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1)

1-212: Comprehensive task planning with clear phase structure. ✓

The document effectively organizes the audio books pipeline productionization into six well-scoped phases, with actionable task lists, file inventories, and cross-references. The checkboxes and detailed sub-tasks provide a clear implementation roadmap.

tasks/backend-productionisation/audio-books-pipeline/design-doc.md (1)

1-315: Design document is comprehensive and well-architected. ✓

The document effectively justifies technology choices (OpenAI TTS, Instructor LLM, Protocol abstraction), provides clear service interfaces with examples, and outlines a realistic implementation roadmap. The Protocol-based design enables provider flexibility without coupling business logic to specific libraries.

tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md (1)

1-184: PRD is comprehensive with clear user stories and measurable success criteria. ✓

The document effectively captures product goals, user stories (both API consumers and developers), and phased functional requirements. The non-goals section provides valuable scope clarification, and the open questions section raises important clarifications needed before implementation. Success metrics (>95% success rate, <5 min processing time) are realistic and measurable.


---

### Phase 0: Storage Service (Prerequisite)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix markdown heading hierarchy: h3 should be h2.

The document jumps from h1 (line 1) to h3 (line 9), skipping h2. Phase sections should use h2 (##) for consistency with markdown structure.

Apply this diff to fix the heading levels:

-### Phase 0: Storage Service (Prerequisite)
+## Phase 0: Storage Service (Prerequisite)

Repeat this fix for all subsequent phase headings (lines 18, 24, 30, 35, 40, 107, etc.).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Phase 0: Storage Service (Prerequisite)
## Phase 0: Storage Service (Prerequisite)
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md
around line 9, the heading "### Phase 0: Storage Service (Prerequisite)" is
using h3 while the document jumps from h1 to h3; change this and all phase
headings (e.g., lines 9, 18, 24, 30, 35, 40, 107, etc.) from h3 (###) to h2 (##)
so phases are consistent with the top-level h1 — update each heading line to
start with "##" instead of "###".

… endpoints and completed tests

- Move endpoints 3.E.5 and 3.E.6 to audio-books-pipeline phase 1
- Mark test files 3.G.1 through 3.G.5 as completed
- Update test descriptions for clarity
Implement initial processing API endpoints including:
- POST /books/process - Start audiobook processing
- POST /books/{audio_book_id}/refresh - Refresh audiobook
- GET /jobs/{job_id} - Check job status

Add request/response schemas and comprehensive unit tests
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (3)

202-210: Document where the OpenLibrary User-Agent value lives and how it’s validated.

3.C.2 now notes “Include app name and contact for API compliance”, but the task list still doesn’t say:

  • Which config setting holds this value (e.g., settings.OPENLIBRARY_USER_AGENT),
  • The expected format (sample string), and
  • That it’s covered by tests (e.g., httpx mock asserting the header).

Expanding 3.C.2 with those bullets will keep the design/tasks doc aligned with the implementation and the earlier review guidance.


342-353: Add explicit infrastructure/deployment tasks for the Redis/AOF configuration.

The Notes section documents Redis behavior (AOF with appendfsync everysec, allkeys-lru, 256mb limit), but there’s still no concrete checklist for wiring this into real deployments (Helm values, docker-compose, health checks, backup/restore strategy), as previously suggested.

Consider adding a “Phase 3.H: Infrastructure & Deployment” section with items like:

### Phase 3.H: Infrastructure & Deployment

- [ ] 3.H.1 Redis Helm values: allkeys-lru, 256mb, AOF appendfsync everysec
- [ ] 3.H.2 Update docker-compose.yml (Redis + PostgreSQL) for local dev
- [ ] 3.H.3 Add Redis health checks to Kubernetes manifests
- [ ] 3.H.4 Document Redis AOF backup/recovery strategy

so operational work is tracked alongside the library design.


87-127: UUIDv7 sorting verification (3.0.4) still lags behind completed model tasks.

3.0.4 (“Verify UUIDv7 sorting works in PostgreSQL”) is still unchecked, but all downstream model/migration tasks in 3.A.1–3.A.9 are marked complete. If UUIDv7 ordering in native uuid columns doesn’t behave exactly as expected, it could impact querying, pagination, and any “ORDER BY id” assumptions baked into the new models.

Either:

  • Complete and document the 3.0.4 verification (against the actual Postgres setup) and mark it [x], or
  • If issues were found, revert any dependent tasks to [ ] until the model/migration design is adjusted.
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1)

9-40: Fix markdown heading hierarchy for top-level Phases.

The document jumps from # to ### for Phase headings (lines 9, 18, 24, 30, 35, 40), which violates the heading increment rule (MD001) and was flagged in the previous review.

Consider promoting these overview phase headings to ## so the structure is:

  • # Tasks: Audio Books Pipeline
  • ## Phase 0/1/2/3/4/5 (overview)
  • ## Dependencies / Tasks
    • ### Phase 0/1/... (within Tasks section)

Example diff:

-### Phase 0: Storage Service (Prerequisite)
+## Phase 0: Storage Service (Prerequisite)

-### Phase 1: Processing API Endpoints
+## Phase 1: Processing API Endpoints

-### Phase 2: Job Infrastructure
+## Phase 2: Job Infrastructure

-### Phase 3: LLM Service
+## Phase 3: LLM Service

-### Phase 4: TTS Service
+## Phase 4: TTS Service

-### Phase 5: Processing Pipeline
+## Phase 5: Processing Pipeline
🧹 Nitpick comments (1)
backend/src/bookbytes/schemas/processing.py (1)

88-119: Consider typing job_type and status as enums instead of bare strings.

Right now job_type and status are plain str fields with descriptive docstrings. Once the Job model and its JobType/JobStatus enums exist, it would be safer to:

  • Reuse those enums (or Pydantic Literal types) here, and
  • Let the OpenAPI schema advertise the finite set of allowed values.

This avoids stringly‑typed status handling across API, services, and DB.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 090cce6 and 2ecddb1.

📒 Files selected for processing (6)
  • backend/src/bookbytes/api/v1/processing.py (1 hunks)
  • backend/src/bookbytes/api/v1/router.py (1 hunks)
  • backend/src/bookbytes/schemas/processing.py (1 hunks)
  • backend/tests/unit/test_processing_endpoints.py (1 hunks)
  • tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1 hunks)
  • tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
backend/src/bookbytes/api/v1/processing.py (1)
backend/src/bookbytes/schemas/processing.py (4)
  • JobStatusResponse (88-137)
  • ProcessRequest (16-56)
  • ProcessResponse (68-85)
  • RefreshRequest (59-65)
backend/tests/unit/test_processing_endpoints.py (1)
backend/src/bookbytes/schemas/processing.py (3)
  • JobStatusResponse (88-137)
  • ProcessRequest (16-56)
  • RefreshRequest (59-65)
backend/src/bookbytes/schemas/processing.py (1)
backend/src/bookbytes/models/base.py (3)
  • id (47-52)
  • created_at (63-68)
  • updated_at (71-77)
🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md

9-9: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🔇 Additional comments (3)
backend/src/bookbytes/api/v1/router.py (1)

6-15: v1 router aggregation looks clean and consistent.

The router composition is straightforward and correctly mounts the search and processing sub-routers under /books; no issues from a routing or structure perspective.

backend/src/bookbytes/schemas/processing.py (1)

16-56: ProcessRequest validation is clear and correctly constrained.

The “exactly one of edition_id or isbn” rule is implemented cleanly via the @model_validator(mode="after"), and the error messages are explicit enough for clients and tests to assert on. No changes needed here.

backend/tests/unit/test_processing_endpoints.py (1)

22-176: Nice coverage of schema validation and current endpoint contracts.

The tests thoroughly exercise ProcessRequest/RefreshRequest validation (including message contents) and assert the current 501 + 422 behaviors on the endpoints, which will make it easy to evolve these handlers later without breaking callers unnoticed.

Comment on lines +74 to +86
# TODO: Implement in Phase 5 (ProcessingService)
# 1. Resolve edition_id or look up by ISBN
# 2. Check if audiobook already exists for this edition
# 3. Create AudioBook record (status=PENDING)
# 4. Create Job record
# 5. Enqueue ARQ task
# 6. Return job_id

# Placeholder - will be replaced with actual implementation
raise HTTPException(
status_code=status.HTTP_501_NOT_IMPLEMENTED,
detail="Processing service not yet implemented. See Phase 3.1.5 tasks.",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align 501 error messages with current task numbering/docs.

The placeholder exceptions reference “Phase 3.1.5 tasks” and “Phase 3.1.2 tasks”, but the task docs now organize this work under different phase IDs (e.g., Phase 2: Job Infrastructure, Phase 5: Processing Pipeline in the audio-books-pipeline tasks file). That mismatch will be confusing when someone follows the pointers.

Consider updating the detail strings to reference the actual sections, e.g.:

-        detail="Processing service not yet implemented. See Phase 3.1.5 tasks.",
+        detail=(
+            "Processing service not yet implemented. "
+            "See audio-books-pipeline Phase 5: Processing Pipeline tasks."
+        )
...
-        detail="Refresh service not yet implemented. See Phase 3.1.5 tasks.",
+        detail=(
+            "Refresh service not yet implemented. "
+            "See audio-books-pipeline Phase 5: Processing Pipeline tasks."
+        )
...
-        detail="Job repository not yet implemented. See Phase 3.1.2 tasks.",
+        detail=(
+            "Job repository not yet implemented. "
+            "See audio-books-pipeline Phase 2: Job Infrastructure tasks."
+        )

Adjust the text to whatever final section names you settle on, but keep it in sync with the task docs.

Also applies to: 120-131, 163-170

Comment on lines +311 to +339
### Phase 3.G: Testing

- [x] 3.G.1 Create `tests/unit/test_cache_service.py`

- Test cache get/set flow
- Test TTL jitter

- [x] 3.G.2 Create `tests/unit/test_openlibrary_service.py`

- Mock HTTP responses
- Test search, work details, ISBN collection

- [x] 3.G.3 Create `tests/unit/test_library_service.py`

- Test provider lookups
- Test work/edition storage

- [ ] 3.G.4 Create `tests/unit/test_repositories.py` <!-- TODO -->

- Test Work, Edition, BookProvider, AudioBook repositories

- [x] 3.G.5 Create `tests/integration/test_search_api.py`

- Test full search flow with mocked OpenLibrary
- Test caching behavior
- Test pagination

- [ ] 3.G.6 Add OpenLibrary mock responses to `tests/mocks/` <!-- TODO -->

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten up remaining Testing tasks 3.G.4 and 3.G.6 for production readiness.

Most of Phase 3.G is now complete, but:

  • 3.G.4 (tests/unit/test_repositories.py) and
  • 3.G.6 (OpenLibrary mock responses under tests/mocks/)

are still TODO. For a “productionise” track, it’d be good to either:

  • Implement these in this or the next immediate PR, or
  • Explicitly mark them as deferred with linked issue IDs so they don’t get lost.
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 311–339, the checklist items 3.G.4 and 3.G.6 are still TODO; either
implement the missing unit tests and OpenLibrary mock fixtures or explicitly
defer them. If implementing: add tests/unit/test_repositories.py that exercises
Work, Edition, BookProvider, and AudioBook repositories (CRUD, relations, edge
cases, transactions and expected errors) and add corresponding deterministic
fixtures under tests/mocks/openlibrary/ (search responses, work detail
responses, ISBN collection entries) matching the shapes used by existing
unit/integration tests; ensure mocks are loaded by the test suite. If deferring:
create issue(s) in the repo, link their IDs in the checklist, assign an owner
and ETA, and update the markdown to mark 3.G.4 and 3.G.6 as deferred with those
issue links.

…sing

Implement generic job tracking model and audio book job relation table
to support background processing pipeline for audio book generation
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
backend/src/bookbytes/models/job.py (1)

77-78: Consider adding a CheckConstraint for progress bounds.

The progress field is documented as "Completion percentage (0-100)" but has no database-level constraint to enforce this range. Invalid values could be persisted.

+from sqlalchemy import CheckConstraint, String
-from sqlalchemy import String
 from sqlalchemy.orm import Mapped, mapped_column
 
 ...
 
 class Job(UUIDPrimaryKeyMixin, TimestampMixin, Base):
     ...
+    __table_args__ = (
+        CheckConstraint("progress >= 0 AND progress <= 100", name="ck_jobs_progress_range"),
+    )
+
     # === Progress Tracking ===
     progress: Mapped[int] = mapped_column(nullable=False, default=0)
backend/src/bookbytes/models/audio_book_job.py (1)

51-53: Consider if eager loading both relationships is always needed.

Both job and audio_book relationships use lazy="joined", meaning they're always fetched via JOIN even when not accessed. If callers often need only one side, consider lazy="selectin" or lazy="raise" with explicit options(joinedload(...)) in queries for better performance control.

This is a minor optimization and acceptable if both are typically needed together.

backend/src/bookbytes/repositories/audio_book_job.py (1)

54-60: Redundant joinedload given model configuration.

AudioBookJob.audio_book already has lazy="joined" in the model definition, so the explicit joinedload here is redundant. Either remove it here or change the model to lazy="select" and keep explicit loading in queries.

     query = (
         select(AudioBookJob)
         .where(AudioBookJob.job_id == job_id)
-        .options(joinedload(AudioBookJob.audio_book))
     )
backend/src/bookbytes/repositories/job.py (4)

9-10: Prefer timezone.utc over UTC for broader Python-version compatibility

from datetime import UTC is only available from Python 3.11 onward; if you intend to support 3.10 or earlier, this import will fail. A more widely compatible option is datetime.timezone.utc. Consider either explicitly pinning Python ≥3.11 or switching to timezone.utc:

-from datetime import UTC, datetime
+from datetime import datetime, timezone
@@
-                started_at=datetime.now(UTC),
+                started_at=datetime.now(timezone.utc),
@@
-                completed_at=datetime.now(UTC),
+                completed_at=datetime.now(timezone.utc),
@@
-                completed_at=datetime.now(UTC),
+                completed_at=datetime.now(timezone.utc),

Please verify the target Python version in your runtime/container image and adjust accordingly.

Also applies to: 65-69, 120-123, 150-154


25-79: Tighten claim_next typing and consider handling optimistic-lock misses

Two small points here:

  • The job_type parameter is annotated as str | None, while get_by_job_type takes JobType and uses .value. For API consistency, consider either:

    • Accepting JobType | None here and using job_type.value, or
    • Explicitly documenting that claim_next expects the raw string value and updating the type hint to JobType | str | None if you want to support both.
  • When the UPDATE fails the optimistic-lock check (another worker claimed the job first), you return None even if other pending jobs exist. That’s correct but can cause extra “empty” polls under contention. If this becomes noticeable, you could loop a few times to try the next candidate before returning, or switch to a DB-level pattern (e.g., SELECT ... FOR UPDATE SKIP LOCKED) where supported.


107-159: Optionally guard completion/failure transitions by current status/worker

mark_completed and mark_failed currently update purely by job_id. If a stale worker or an out-of-band caller invokes these after the job has been retried, cancelled, or otherwise transitioned, the update will still succeed and overwrite the status/timestamps.

If you want stronger invariants, consider adding guards such as:

  • where(Job.status == JobStatus.PROCESSING.value) and/or
  • where(Job.worker_id == expected_worker_id) or a version check similar to claim_next.

This would make state transitions more robust against stale calls and better align with the “optimistic locking for concurrency control” description in the class docstring.


195-253: Query patterns look good; ensure supporting DB indexes and optionally drop redundant list(...)

The status/job-type query helpers are straightforward and match expected usage patterns. Two minor suggestions:

  • For production scale, ensure the DB has appropriate composite indexes, e.g.:

    • (status, created_at) for get_by_status / get_pending_count.
    • (job_type, created_at DESC) for get_by_job_type.
      This will keep these queries cheap as the jobs table grows.
  • result.scalars().all() already returns a list; the outer list(...) call is redundant and can be dropped for slightly cleaner code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ecddb1 and 127c524.

📒 Files selected for processing (7)
  • backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py (1 hunks)
  • backend/src/bookbytes/models/__init__.py (1 hunks)
  • backend/src/bookbytes/models/audio_book_job.py (1 hunks)
  • backend/src/bookbytes/models/job.py (1 hunks)
  • backend/src/bookbytes/repositories/__init__.py (1 hunks)
  • backend/src/bookbytes/repositories/audio_book_job.py (1 hunks)
  • backend/src/bookbytes/repositories/job.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/src/bookbytes/repositories/init.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/src/bookbytes/repositories/audio_book_job.py (4)
backend/src/bookbytes/models/audio_book_job.py (1)
  • AudioBookJob (22-58)
backend/src/bookbytes/models/job.py (1)
  • Job (37-114)
backend/src/bookbytes/repositories/base.py (1)
  • BaseRepository (43-206)
backend/src/bookbytes/schemas/common.py (1)
  • limit (117-119)
backend/src/bookbytes/models/job.py (1)
backend/src/bookbytes/models/base.py (3)
  • Base (24-30)
  • UUIDPrimaryKeyMixin (33-52)
  • id (47-52)
backend/src/bookbytes/models/audio_book_job.py (3)
backend/src/bookbytes/models/base.py (2)
  • Base (24-30)
  • UUIDPrimaryKeyMixin (33-52)
backend/src/bookbytes/models/audio_book.py (1)
  • AudioBook (37-79)
backend/src/bookbytes/models/job.py (1)
  • Job (37-114)
🪛 Ruff (0.14.8)
backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py

8-8: Import from collections.abc instead: Sequence

Import from collections.abc

(UP035)


16-16: Use X | Y for type annotations

Convert to X | Y

(UP007)


17-17: Use X | Y for type annotations

Convert to X | Y

(UP007)


18-18: Use X | Y for type annotations

Convert to X | Y

(UP007)

🔇 Additional comments (8)
backend/src/bookbytes/models/job.py (3)

18-35: Well-structured enums for job status and type.

Good use of str, Enum pattern for JSON serialization compatibility and database storage as strings. The enums provide type safety while remaining database-friendly.


85-86: The version field for optimistic locking is manual.

The docstring mentions "Optimistic lock for concurrent access" but this requires manual implementation in the repository layer to check and increment the version on updates. SQLAlchemy's built-in version_id_col provides automatic optimistic locking with StaleDataError on conflicts.

If manual control is intentional (e.g., for custom retry logic), this is fine. Otherwise, consider using SQLAlchemy's native mechanism.


103-113: Clean property implementations.

The is_terminal and can_retry properties correctly check against enum values and provide clear business logic encapsulation.

backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py (1)

59-68: Downgrade correctly reverses all changes.

The downgrade properly drops indices before tables and handles the dependency order correctly (audio_book_jobs before jobs).

backend/src/bookbytes/models/audio_book_job.py (1)

22-58: Well-designed association model.

Clean implementation of the 1:1 relationship between Job and AudioBook. The unique constraint on job_id correctly enforces the cardinality, and CASCADE deletes ensure referential integrity.

backend/src/bookbytes/repositories/audio_book_job.py (1)

62-106: Query implementations look correct.

The join queries for fetching jobs by audiobook are well-structured with proper ordering and limits.

backend/src/bookbytes/models/__init__.py (1)

1-43: Clean and well-organized module exports.

The __all__ is comprehensive, properly categorized by implementation phase, and includes helpful comments about design decisions (Redis-only caching) and future plans.

backend/src/bookbytes/repositories/job.py (1)

81-105: Progress updates look solid and defensive

The progress clamping (0–100) and optional current_step update are nice touches, and returning a boolean based on rowcount keeps the API simple. No changes needed here from my side.

Comment on lines +35 to +36
sa.Column('started_at', sa.DateTime(), nullable=True),
sa.Column('completed_at', sa.DateTime(), nullable=True),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Timezone inconsistency between timestamp columns.

started_at and completed_at use sa.DateTime() without timezone, while created_at and updated_at use sa.DateTime(timezone=True). This inconsistency can cause timezone-related bugs when comparing or displaying these timestamps together.

-    sa.Column('started_at', sa.DateTime(), nullable=True),
-    sa.Column('completed_at', sa.DateTime(), nullable=True),
+    sa.Column('started_at', sa.DateTime(timezone=True), nullable=True),
+    sa.Column('completed_at', sa.DateTime(timezone=True), nullable=True),

Also update the model in backend/src/bookbytes/models/job.py to use timezone-aware types:

from sqlalchemy import DateTime

started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
🤖 Prompt for AI Agents
In backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py
around lines 35-36, the started_at and completed_at columns are created as naive
DateTime() while created_at/updated_at use timezone-aware
DateTime(timezone=True); update these two Column definitions to use
sa.DateTime(timezone=True) to ensure consistency, and also update
backend/src/bookbytes/models/job.py to declare started_at and completed_at as
timezone-aware mapped_columns (DateTime(timezone=True)) so the ORM model and
migration match.

Comment on lines +25 to +43
async def create_link(
self,
job_id: UUID,
audio_book_id: UUID,
) -> AudioBookJob:
"""Create a link between a job and an audiobook.

Args:
job_id: The job's UUID
audio_book_id: The audiobook's UUID

Returns:
The created AudioBookJob link
"""
link = AudioBookJob(job_id=job_id, audio_book_id=audio_book_id)
self.session.add(link)
await self.session.commit()
await self.session.refresh(link)
return link
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Using commit() breaks transaction boundaries.

create_link calls session.commit() directly, unlike BaseRepository.create which uses flush(). This commits any pending changes from the caller's transaction, preventing atomic operations across multiple repository calls.

     async def create_link(
         self,
         job_id: UUID,
         audio_book_id: UUID,
     ) -> AudioBookJob:
         ...
         link = AudioBookJob(job_id=job_id, audio_book_id=audio_book_id)
         self.session.add(link)
-        await self.session.commit()
+        await self.session.flush()
         await self.session.refresh(link)
         return link

The caller should control when to commit the transaction.

🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/audio_book_job.py around lines 25 to 43,
the method calls self.session.commit() which breaks caller transaction
boundaries; change it to add the new AudioBookJob to the session, call await
self.session.flush() (not commit) to ensure the instance gets a database
identity, then await self.session.refresh(link) and return the link so the
caller retains control over committing the transaction.

Comment on lines +161 to +193
async def schedule_retry(self, job_id: UUID) -> bool:
"""Schedule a failed job for retry.

Increments retry_count and sets status back to pending.

Args:
job_id: The job's UUID

Returns:
True if retry scheduled, False if max retries exceeded
"""
# Get current job state
job = await self.get_by_id(job_id)
if not job or not job.can_retry:
return False

stmt = (
update(Job)
.where(Job.id == job_id)
.values(
status=JobStatus.PENDING.value,
retry_count=job.retry_count + 1,
worker_id=None,
started_at=None,
completed_at=None,
error_message=None,
error_code=None,
)
)
await self.session.execute(stmt)
await self.session.commit()

return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reset progress when scheduling a retry (and consider optimistic locking)

Semantic nit that can affect monitoring/UX:

  • schedule_retry resets status, worker, timestamps, and error fields, but leaves progress unchanged. A job that failed at 80% will then be PENDING with progress=80, which is misleading for dashboards and workers that treat progress as per-attempt. It’s usually cleaner to reset progress to 0 (or None) on retry:
         stmt = (
             update(Job)
             .where(Job.id == job_id)
             .values(
                 status=JobStatus.PENDING.value,
                 retry_count=job.retry_count + 1,
                 worker_id=None,
                 started_at=None,
                 completed_at=None,
                 error_message=None,
                 error_code=None,
+                progress=0,
             )
         )
  • If you might have multiple schedulers or racey retry triggers, you may also want to include an optimistic-lock condition (e.g., on version or retry_count) to avoid double-increments and scheduling a job that was already retried.
🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/job.py around lines 161 to 193,
schedule_retry currently resets status, worker, timestamps, and error fields but
leaves progress unchanged and is vulnerable to racey double-retries; modify the
update to also reset progress (set to 0 or None per schema/UX choice) and add an
optimistic-lock condition (e.g., include a WHERE clause matching the current
retry_count or version) so the retry_count is only incremented when the record
is still the same — return False if the conditional update affected no rows,
otherwise commit and return True.

Implement repositories for job management and job-audiobook relations with worker-safe operations.
Includes atomic job claiming, progress updates, and status transitions with optimistic locking.
@cryptus-neoxys cryptus-neoxys force-pushed the feat/productionise-and-saasify branch from 127c524 to e162290 Compare December 11, 2025 11:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1)

9-9: Align phase heading levels with the top‑level H1.

The document jumps from the H1 on line 1 to H3 phase headings (e.g., line 9), which violates markdown heading‑increment rules and the style flagged by markdownlint. To keep a clean hierarchy, make each phase heading an H2:

-### Phase 0: Storage Service (Prerequisite)
+## Phase 0: Storage Service (Prerequisite)

-### Phase 1: Processing API Endpoints
+## Phase 1: Processing API Endpoints

-### Phase 2: Job Infrastructure
+## Phase 2: Job Infrastructure

-### Phase 3: LLM Service
+## Phase 3: LLM Service

-### Phase 4: TTS Service
+## Phase 4: TTS Service

-### Phase 5: Processing Pipeline
+## Phase 5: Processing Pipeline

-### Phase 6: Dependency Injection Setup
+## Phase 6: Dependency Injection Setup

Apply the same change to all phase headings so the structure is # (title) → ## (phases).

Also applies to: 18-18, 24-24, 30-30, 35-35, 40-40, 107-107, 132-132, 162-162, 183-183, 216-216

🧹 Nitpick comments (1)
backend/src/bookbytes/repositories/__init__.py (1)

6-13: Verify SoftDeleteRepository import source; consider importing from its own module.

The exported surface in __all__ looks coherent, but SoftDeleteRepository is imported from bookbytes.repositories.base while the contextual snippets indicate it’s defined in soft_delete_repository.py. If base.py does not re-export SoftDeleteRepository, this will raise an import error at package import time.

If SoftDeleteRepository lives only in soft_delete_repository.py, prefer importing it directly from there:

-from bookbytes.repositories.base import BaseRepository, SoftDeleteRepository
+from bookbytes.repositories.base import BaseRepository
+from bookbytes.repositories.soft_delete_repository import SoftDeleteRepository

Please double‑check which module actually defines SoftDeleteRepository and adjust accordingly.

Also applies to: 15-28

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 127c524 and e162290.

📒 Files selected for processing (4)
  • backend/src/bookbytes/repositories/__init__.py (1 hunks)
  • backend/src/bookbytes/repositories/audio_book_job.py (1 hunks)
  • backend/src/bookbytes/repositories/job.py (1 hunks)
  • tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/src/bookbytes/repositories/job.py
  • backend/src/bookbytes/repositories/audio_book_job.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/src/bookbytes/repositories/__init__.py (8)
backend/src/bookbytes/repositories/audio_book.py (1)
  • AudioBookRepository (15-151)
backend/src/bookbytes/repositories/audio_book_job.py (1)
  • AudioBookJobRepository (19-106)
backend/src/bookbytes/repositories/base.py (2)
  • BaseRepository (43-206)
  • SoftDeleteRepository (209-381)
backend/src/bookbytes/repositories/book_provider.py (1)
  • BookProviderRepository (15-149)
backend/src/bookbytes/repositories/chapter.py (1)
  • ChapterRepository (14-115)
backend/src/bookbytes/repositories/edition.py (1)
  • EditionRepository (14-90)
backend/src/bookbytes/repositories/job.py (1)
  • JobRepository (18-253)
backend/src/bookbytes/repositories/work.py (1)
  • WorkRepository (13-75)
🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md

9-9: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
backend/pyproject.toml (1)

10-10: Fix Python version mismatch between project requirement and tooling config (flagged in past review).

The project explicitly requires Python >=3.13 (line 10) and classifies as Python 3.13 (line 22), but Ruff (line 100) and mypy (line 143) are configured for Python 3.11. This creates a validation gap where linters cannot properly check code written for Python 3.13 features.

Update tooling to match:

-target-version = "py311"
+target-version = "py313"

-python_version = "3.11"
+python_version = "3.13"

Also applies to: 100-100, 143-143

🧹 Nitpick comments (1)
backend/pyproject.toml (1)

26-73: Consider pinning production dependency versions for reproducibility.

While most dependencies lack version constraints (allowing flexibility), production deployments typically benefit from pinned versions to ensure consistency across environments. Consider establishing a versioning strategy—either pinning all transitive dependencies via a lock file (e.g., uv.lock or poetry.lock), or at least pinning major versions of critical dependencies like sqlalchemy, fastapi, redis, and openai.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e162290 and c463422.

⛔ Files ignored due to path filters (1)
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • backend/pyproject.toml (1 hunks)
🔇 Additional comments (2)
backend/pyproject.toml (2)

26-73: Comprehensive dependency set for production-ready FastAPI backend.

The dependency organization is well-structured with clear categories and includes appropriate libraries for async database operations, background jobs, observability, and external integrations. Good use of modern async patterns (asyncpg, aioboto3, httpx) alongside structured logging and resilience patterns (tenacity).


150-156: Verify mypy overrides for untyped dependencies.

Three modules are configured to ignore missing imports: gtts, arq, and aioboto3. While this is appropriate for untyped libraries, confirm that these are the only modules without type stubs and that the override scope remains minimal to preserve strict type checking elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants