-
Notifications
You must be signed in to change notification settings - Fork 0
feat(app): productionise and initiate saasification #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
70ae31a to
a5ac636
Compare
…ons, including architecture, goals, user stories, and database schema
8942da9 to
efbaa27
Compare
…onfiguration, and dependency management.
…nd testing - Add structured logging with correlation ID support - Implement custom exception hierarchy for consistent error responses - Configure request ID middleware and logging middleware - Update health check endpoints with request ID support - Add Docker and docker-compose configuration for production deployment - Set up test fixtures and mocks for integration testing
Update documentation to reflect architectural changes from Flask to FastAPI. Include new project structure, logging conventions, and testing guidelines.
add base models, repositories and async database setup initialize alembic for async migrations update health check to verify database connectivity add integration tests for database operations
- Add Dockerfile.dev for development with hot reload support - Create docker-compose.dev.yml with postgres and redis services - Update main Dockerfile to include Readme.md and use shell form for CMD - Remove Readme.md to .dockerignore exceptions
update docker compose contexts, sources delete old .env.example
The Makefile provides convenient commands for building, running, and managing the BookBytes application in both development and production environments. It includes targets for container management, logging, database operations, testing, and cleanup.
Update build context paths from '.' to '..' to correctly reference parent directory. Also adjust source volume mounts to use '../src' instead of './src' to maintain consistency with the new context paths.
Add comprehensive design document outlining architecture decisions, data models, caching strategy, and implementation phases for the enhanced book search flow in the audio books library. The document covers provider-agnostic storage, two-tier caching, and future multi-API support.
37e3ec2 to
dc674d4
Compare
Restructure the backend productionisation tasks by extracting the audio books library functionality into a separate module. This includes: - New dedicated task file with detailed implementation steps - Updated models, repositories, services, and API endpoints - Comprehensive testing plan - Improved caching strategy with two-tier approach
add uuid6 dependency and pg_idkit extension for UUIDv7 support implement all audio books library models (Work, Edition, AudioBook, Chapter, BookProvider, APICache) update base model to use UUIDv7 primary keys Add pg_idkit extension for PostgreSQL UUIDv7 generation. export new models in __init__.py
…hing Update documentation and models to reflect removal of PostgreSQL cache layer. Simplify language code length in Edition model from 7 to 3 characters.
Add start-dev-infra and start-prod-infra targets to run only database and redis services
Add repositories for audio books library entities including Work, Edition, Chapter, AudioBook and BookProvider with CRUD operations and specialized queries. Remove pg_idkit extension migration as UUIDv7 is now generated in Python. feat(cache): implement Redis-based CacheService Add CacheService with TTL-based caching, stale-while-revalidate support and pattern invalidation. Uses Redis with AOF persistence for durability. docs: update task checklist with completed items Mark all audio books library tasks as completed including model creation, repository implementation and cache service setup.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughAdds a production-ready FastAPI backend: async SQLAlchemy models, repositories and Alembic migrations; Redis-backed CacheService and OpenLibrary integration; structured logging and Pydantic Settings; Docker/dev tooling and Makefile; API v1 search and processing endpoints with schemas; plus unit and integration tests and fixtures. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as FastAPI
participant Cache as CacheService (Redis)
participant OL as OpenLibraryService
participant DB as Postgres / Repos
Client->>API: POST /api/v1/books/search
API->>Cache: get(search_key)
alt cache hit & fresh
Cache-->>API: data, needs_revalidation=false
API-->>Client: 200 OK (cached data)
else cache hit & stale
Cache-->>API: data, needs_revalidation=true
API-->>Client: 200 OK (stale data)
API->>OL: async revalidate -> search_books(...)
OL->>Cache: set(search_key, new_data)
else cache miss
API->>OL: search_books(...)
OL->>OL: fetch & parse OpenLibrary
OL->>Cache: set(search_key, parsed_data)
OL-->>API: parsed_data
API-->>Client: 200 OK (fresh data)
end
Note over OL,DB: persistence path
OL->>DB: create/get Work & Edition via repositories
DB-->>OL: persisted entities
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Nitpick comments (32)
backend/src/bookbytes/__init__.py (1)
1-3: Clean package metadata; optionally single‑source the version.Docstring and
__version__are clear and conventional; this is good as a minimal package initializer.If you expect the version to also live in
pyproject.toml/setup.cfg, consider single‑sourcing it (e.g., reading from packaging metadata or centralizing the value in one place) to avoid future drift. Not urgent for now..agent/rules/pythonic-fastapi-async.md (1)
8-36: Consider organizing guidance into sections for better scanability.The guidance content is comprehensive and substantively sound, but it's presented as a flat list of bullet points. Adding section headers would make it easier to navigate and reference related guidance.
For example:
You are an expert in Python, FastAPI, and scalable API development. +## Recommended Stack + FastAPI Pydantic v2 Async database libraries SQLAlchemy 2.0 (if using ORM features) +## Coding Conventions + Write concise, technical responses with accurate Python examples. Prefer iteration and modularization over code duplication. Favour composition over inheritance. Smart core, Thin interfaces. Use descriptive variable names with auxiliary verbs (e.g., is_active, has_permission). Use lowercase with underscores for directories and files (e.g., routers/user_routes.py). Favor named exports for routes and utility functions. Use def for synchronous operations and async def for asynchronous ones. +## Type Hints and Validation + Use type hints for all function signatures. Prefer Pydantic models over raw dictionaries for input validation. Write typed python code strictly, and avoid the use of `Any` +## Error Handling + Prioritize error handling and edge cases [... rest organized similarly]backend/docker/Dockerfile (1)
72-73: Shell form CMD with env var interpolation may fail in slim base image.The CMD uses shell form to interpolate
${PORT}, which requires/bin/shto be present. Since the base image ispython:3.13-slim, this typically works, but it's not guaranteed across all variants. For robustness, interpolate the PORT variable at image build time instead:-ENV PATH="/app/.venv/bin:$PATH" \ +ENV PATH="/app/.venv/bin:$PATH" \ PYTHONUNBUFFERED=1 \ PYTHONDONTWRITEBYTECODE=1 \ APP_ENV=production \ HOST=0.0.0.0 \ - PORT=8000 + PORT=8000 \ + UVICORN_CMD="uvicorn bookbytes.main:app --host 0.0.0.0 --port 8000" -CMD uvicorn bookbytes.main:app --host 0.0.0.0 --port ${PORT} +CMD ${UVICORN_CMD}Alternatively, use exec form with hardcoded port if the PORT environment variable is not meant to be overridden at runtime.
Verify that the
bookbytes.main:appFastAPI application provides a/health/liveendpoint as expected by the HEALTHCHECK directive (line 69).tasks/backend-productionisation/audio-books-library/design-doc.md (2)
26-26: Add language specifiers to fenced code blocks per markdownlint (MD040).Fenced code blocks should specify a language for proper syntax highlighting and linting. Update:
-``` +```json { "numFound": 629, ... } -``` +```Similarly for:
- Line 76: Use ````mermaid` (if diagram) or ``` for ASCII art
- Line 360: Use ``` (plain text/example)
- Line 397: Use ``` (plain text/diagram)
Also applies to: 76-76, 360-360, 397-397
332-332: Convert emphasis-based headings to proper markdown headings (MD036).Use heading syntax instead of bold emphasis for section titles:
-**Example Usage:** +### Example UsageAnd:
-**Key Points:** +### Key PointsAlso applies to: 584-584
.github/copilot-instructions.md (1)
12-26: Add language identifier to fenced code block.Per markdownlint MD040, fenced code blocks should have a language specified. For directory structures, use
textorplaintext:-``` +```text main.py # FastAPI app factory with lifespan, middleware, exception handlersbackend/src/bookbytes/models/base.py (1)
103-107: Remove redundant import inside method.
datetimeis already imported at the module level (line 17). The local import is unnecessary:def mark_deleted(self) -> None: """Mark this entity as soft-deleted.""" - from datetime import UTC, datetime - - self.deleted_at = datetime.now(UTC) + from datetime import UTC + self.deleted_at = datetime.now(UTC)Or better, import
UTCat the module level alongsidedatetime:-from datetime import datetime +from datetime import UTC, datetimeThen simplify the method:
def mark_deleted(self) -> None: """Mark this entity as soft-deleted.""" - from datetime import UTC, datetime - self.deleted_at = datetime.now(UTC)backend/tests/mocks/openlibrary_responses.py (1)
102-104: Consider adding documentation for empty response behavior.The empty dicts for
NOT_FOUND_RESPONSEandINVALID_ISBN_RESPONSEappear intentional (Open Library API returns{}for unknown ISBNs), but a brief comment would clarify this for future maintainers:+# Open Library returns empty dict for ISBNs not in their database NOT_FOUND_RESPONSE: dict[str, Any] = {} +# Invalid ISBN format also returns empty dict from the API INVALID_ISBN_RESPONSE: dict[str, Any] = {}backend/src/bookbytes/repositories/base.py (2)
39-40: Remove unused TypeVar.
Sis defined but never used in this module:# Type variable for model classes T = TypeVar("T", bound=Base) - -# Type variable for soft-deletable models -S = TypeVar("S", bound=SoftDeleteMixin)
209-218: Type parameter doesn't enforceSoftDeleteMixinrequirement.The docstring correctly notes models "must include SoftDeleteMixin," but the type system doesn't enforce this. The
type: ignorecomments throughout acknowledge this gap.For stricter typing (optional, Python typing limitations make this verbose):
from typing import Protocol class SoftDeletable(Protocol): deleted_at: datetime | None # Then use intersection or runtime checksThe current approach with documentation is pragmatic and common in Python. The
type: ignorecomments are appropriate.env.example (1)
11-120: Clarify that these defaults are for local dev only (consider separate.env.dev).Values like
DEBUG=true,AUTH_MODE=api_key,JWT_SECRET_KEY=dev-secret-key-change-in-production, and the sampleAPI_KEYare great for local setup but risky if copied verbatim into production. Consider either:
- Explicitly marking this file as development-only at the top, or
- Providing a second template (e.g.
.env.prod.example) with safer defaults (DEBUG=false,AUTH_MODE=jwt, blank secrets).backend/docker/docker-compose.dev.yml (1)
12-25: Avoid hardcoding credentials in compose; prefer env files or variables even for dev.
DATABASE_URL(with user/password) and defaults forAPI_KEY,JWT_SECRET_KEY, andOPENAI_API_KEYare currently embedded in the compose file. For a cleaner security posture and to keep static analysis tools happy, consider:
- Moving these into a
.env/backend.envfile referenced viaenv_file, or- Using placeholder vars here and letting developers define real values in their local
.env.This keeps the compose file non-sensitive and closer to how production will be configured.
backend/src/bookbytes/models/edition.py (1)
23-77: Edition model wiring looks solid; consider tightening a couple of invariants.The core mapping (PK mixins,
work_idFK,isbnuniqueness, relationships toWork,AudioBook, andBookProvider) is clean and consistent with the other models.Two optional refinements you might consider:
- Constrain
isbn_typevalues: given the docstring promises"isbn10"or"isbn13", aCheckConstraintor SQLAlchemyEnumwould prevent accidental invalid values.- One-to-one
audio_booksemantics: you already useuselist=Falseanddelete-orphan; if you want to strictly enforce single-parent semantics at the ORM level, you can also setsingle_parent=Trueon the relationship (worth verifying against your SQLAlchemy version and desired behaviour).backend/docker/Dockerfile.dev (1)
8-55: Dev image is sound; consider pinninguvand reevaluating--no-devfor DX.This Dockerfile is well-structured (non-root user, isolated venv, curl for healthcheck). Two optional tweaks:
- Pin the
uvimage tag instead of:latestto avoid surprise breakages whenuvupdates.- Double‑check
--no-dev: for a development image you may actually want dev/test tooling (linters, pytest, etc.) inside the container. If your workflow runs those on the host only, this is fine; otherwise, dropping--no-devcan simplify containerized dev.backend/src/bookbytes/models/book_provider.py (1)
89-94: Index onentity_idmay be less useful without FK enforcement.The index
ix_book_providers_entity_lookupon(entity_type, entity_id)is meant for efficient lookups, but sinceentity_idis not a foreign key, queries will likely usework_idoredition_iddirectly via relationships. Consider indexingwork_idandedition_idinstead ifentity_idis removed.backend/tests/integration/test_database.py (1)
19-52: Test doesn't verify session cleanup on exceptions.The test covers the happy path but doesn't verify that the session properly rolls back on exceptions, which is part of the
get_async_sessioncontract (as shown in the relevant snippet fromdatabase.pylines 65-80).Consider adding a test case for exception handling:
@pytest.mark.asyncio async def test_database_session_rollback_on_error() -> None: """Test that session rolls back on exception.""" # ... setup similar to test_database_session_lifecycle ... with pytest.raises(ValueError): async for session in get_async_session(): await session.execute(text("SELECT 1")) raise ValueError("Simulated error") # Session should have rolled back automatically await close_db()tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1)
62-68: Specify language for the fenced code block.The folder structure code block at line 62 should have a language specifier for proper rendering and syntax highlighting.
-``` +```text bookbytes/ ├── alembic/tasks/backend-productionisation/technical-considerations-backend-productionisation.md (1)
62-62: Specify language for the fenced code block.The folder structure block should have a language identifier for proper rendering.
-``` +```text bookbytes/backend/tests/conftest.py (1)
74-76: Unused import.
asynccontextmanageris imported but never used in this fixture.- from contextlib import asynccontextmanager - from bookbytes.core.database import close_db, init_dbbackend/src/bookbytes/core/logging.py (1)
94-114: Redundant processor in JSON formatter configuration.The
processorslist includesJSONRenderer()at line 99, and theProcessorFormatterat line 102 creates anotherJSONRenderer()instance. Theprocessorslist is passed tostructlog.configure()at line 118, which is correct, but creating two separate renderer instances is slightly wasteful.Consider reusing the same renderer instance:
if settings.use_json_logs: # Production: JSON format for log aggregation + json_renderer = structlog.processors.JSONRenderer() processors: list[Processor] = [ *shared_processors, structlog.processors.format_exc_info, - structlog.processors.JSONRenderer(), + json_renderer, ] formatter = structlog.stdlib.ProcessorFormatter( - processor=structlog.processors.JSONRenderer(), + processor=json_renderer, foreign_pre_chain=shared_processors, )backend/src/bookbytes/main.py (2)
29-31: Logger initialized beforeconfigure_logging()is called.The module-level
logger = get_logger(__name__)executes at import time, beforeconfigure_logging()runs during app lifespan. This logger instance will use structlog's default configuration, potentially causing inconsistent log formatting for early errors.Consider deferring logger acquisition or accepting this as a known limitation for module-level logging.
115-117: Redundant conditional expressions.The ternary expressions evaluate to the same value in both branches, making the conditionals pointless. Either remove them or implement the intended production behavior (e.g., disabling docs in production).
- docs_url="/docs" if settings.is_development else "/docs", - redoc_url="/redoc" if settings.is_development else "/redoc", - openapi_url="/openapi.json" if settings.is_development else "/openapi.json", + docs_url="/docs" if settings.is_development else None, + redoc_url="/redoc" if settings.is_development else None, + openapi_url="/openapi.json" if settings.is_development else None,backend/src/bookbytes/core/database.py (2)
149-162: Consider more resilient behavior and richer logging in health checks
check_db_connectionis fine for a happy-path health check, but two small tweaks could improve operability:
- If the DB is not initialized,
get_engine()will raiseRuntimeError. For health endpoints, it can be nicer to map that toFalsewith a clear log message rather than surfacing an unexpected 500 upstream.- When logging failures, you may want to capture the stack trace for easier debugging, e.g. by adding
exc_info=Trueor logging the exception object directly for your structlog configuration.Example:
except Exception as e: logger.error("Database health check failed", error=str(e)) return Falseis fine, but consider including structured fields like
exc=str(e)and any relevant context (e.g. DSN alias) if your logging pipeline supports it.Please confirm how your health endpoint wraps this function and whether surfacing an uninitialized engine as an exception vs. a simple
Falsealigns with your desired health semantics.
165-183: Strengthen URL password masking using proper URL parsingThe current
_mask_passwordimplementation is simple and works for typicaluser:password@hostURLs, but it can be tripped up by edge cases (e.g.,@inside the password, different schemes, or additional://segments).Consider using
urllib.parse.urlsplit/urlunsplit(or SQLAlchemy’s URL utilities) to parse and reconstruct the URL while replacing only the password component. That will make masking more robust and less sensitive to unexpected URL shapes.Please review how many different DSN shapes you expect to support (e.g., with query params, special characters in passwords) and whether it’s worth moving to a parser-based approach now.
Makefile (3)
1-50: Fix stale.PHONYentry and consider adding standard targetsTwo small Makefile polish points:
.PHONYlistsclean-docker-dev, but there is no such target defined (you haveclean-dev,clean-dev-containers, etc.). Either:
- Rename the entry to
clean-dev, or- Add a
clean-docker-devalias target that delegates toclean-dev.Tools like
checkmakeare also flagging missing standard phony targetsallandclean. If you care about those conventions, you could add simple wrappers:all: help clean: clean-devThese changes will make
make helpoutput and static-analysis happier, but they’re not functionally required.
115-120: Tests re-runninguv syncon every invocation may be slowBoth
testandtest-covrunuv sync --all-extrasevery time. For frequent local test runs this can add noticeable latency.If your environment allows it, consider:
- A separate
bootstrap/depstarget that runsuv sync, and- Making
test/test-covdepend only onuv run pytest....Example:
deps: cd $(BACKEND_DIR) && uv sync --all-extras test: deps cd $(BACKEND_DIR) && uv run pytestThis keeps tests snappy once dependencies are up to date.
129-143: Improve shell portability in cleanup helpersThe cleanup targets work, but they rely on features that are not universally available in
/bin/shor on non-GNU systems:
clean-dev-imagesusesxargs -r, which is GNU-specific and will fail on macOS/BSDxargs.clean-dev-volumesusesread -p, which is a Bash extension and not supported by all/bin/shimplementations (e.g.,dash).To make these targets more portable, you could:
- Either declare
SHELL := /bin/bashat the top of the Makefile and require Bash, and- Replace
xargs -rwith a more portable pattern (e.g., guarding withtest -nor usingxargswithout-rbut with|| true), or use a small shell script.These tweaks reduce surprising failures when contributors run cleanup commands on different platforms.
Please confirm which shells/OSes you intend to support for development so you can decide whether to enforce Bash explicitly or aim for POSIX sh + GNU utils.
backend/src/bookbytes/schemas/common.py (5)
24-31: Consider actually usingBaseSchemaas the base for shared response models
BaseSchemaprovides useful common config (from_attributes,populate_by_name,str_strip_whitespace), but all schemas in this file currently inherit directly fromBaseModel.If the intent is to have consistent behavior across API schemas, consider having at least the response-style models (
ErrorDetail,ErrorResponse,PaginatedResponse,UUIDMixin/TimestampMixin/UUIDTimestampMixin,HealthStatus,HealthCheckResponse,MessageResponse) inherit fromBaseSchemainstead. That avoids repeating config and ensures ORM objects can be passed directly where appropriate.Please verify how these schemas are instantiated elsewhere (e.g., from SQLAlchemy models) to confirm whether
from_attributes=Trueis needed universally or only for a subset.
39-92: Error schema design is consistent and expressiveThe
ErrorDetail/ErrorResponseshapes, examples, and field descriptions are clear and aligned with problem-details-style error payloads. This should give clients a predictable error contract. No changes needed here; optional future enhancement would be introducing anEnumforcodevalues once you have a stable set of well-known error codes.If you later introduce an
Enumor Literal type forcode, please double-check how that interacts with your OpenAPI generator and client code generation.
100-158: Double-check paginationpagessemantics for empty result sets
PaginationParams.offset/limitand thePaginatedResponse.createfactory look correct, but note the behavior fortotal == 0:pages = (total + size - 1) // size if total > 0 else 0This yields
pages = 0when there are no items. Some APIs instead reportpages = 0, others prefer1to indicate “one empty page”. Either is defensible, but it’s worth confirming that:
- Your UI and client code expect
pages=0in the empty case, and- Any pagination controls handle that case cleanly.
If you want “always at least one page”, you could adjust to:
pages = max(1, (total + size - 1) // size)Otherwise the current behavior is fine.
Please check how existing or planned frontends interpret
pages=0for empty collections before locking in this convention.
165-181: Clarify mixin role and consider aligning withBaseSchemaThe Pydantic
UUIDMixin/TimestampMixin/UUIDTimestampMixinnicely mirror your SQLAlchemy mixins conceptually, but a couple of small refinements may help:
- Since these classes already inherit from
BaseModel, they are full models, not “pure mixins”. That’s fine, but it means consuming schemas will typically use multiple inheritance, e.g.class BookOut(UUIDTimestampMixin, BaseSchema): .... Given that, you may want these mixins to extendBaseSchemainstead ofBaseModelto inherit the shared config automatically.- Their names are identical to the SQLAlchemy mixins in
backend/src/bookbytes/models/base.py. To avoid confusion at call sites, consider using explicit import aliases (from ...models.base import UUIDMixin as SAUUIDMixin) or renaming these to something likeUUIDSchemaMixin/TimestampSchemaMixin.Functionally this is fine as-is; these are mainly clarity and reuse improvements.
Please verify how these mixins are composed in your actual response models to ensure MRO and
model_configresolution behave as you expect under Pydantic v2.
189-224: Health schemas are flexible; you may want stronger typing for statusesThe health check schemas are practical and the examples match the field definitions. Two optional tweaks you might consider:
- Replace the raw
str+ regex patterns withLiteral[...]or smallEnums forstatusvalues ("ok","degraded","error"). That improves type checking and auto-complete in clients.- For
checks: dict[str, HealthStatus | str], decide whether you truly need both richHealthStatusand bare strings in the same map. If not, standardizing onHealthStatuswill simplify consumer logic.None of this is required for correctness; the current design is workable.
If you introduce
LiteralorEnumtypes, please confirm compatibility with your OpenAPI tooling and any generated clients.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (53)
.agent/rules/pythonic-fastapi-async.md(1 hunks).dockerignore(1 hunks).env.example(0 hunks).github/copilot-instructions.md(1 hunks)Makefile(1 hunks)backend/alembic.ini(1 hunks)backend/alembic/README(1 hunks)backend/alembic/env.py(1 hunks)backend/alembic/script.py.mako(1 hunks)backend/alembic/versions/001_pg_idkit_extension.py(1 hunks)backend/docker/Dockerfile(1 hunks)backend/docker/Dockerfile.dev(1 hunks)backend/docker/docker-compose.dev.yml(1 hunks)backend/docker/docker-compose.yml(1 hunks)backend/pyproject.toml(1 hunks)backend/src/bookbytes/__init__.py(1 hunks)backend/src/bookbytes/api/__init__.py(1 hunks)backend/src/bookbytes/api/v1/__init__.py(1 hunks)backend/src/bookbytes/config.py(1 hunks)backend/src/bookbytes/core/__init__.py(1 hunks)backend/src/bookbytes/core/database.py(1 hunks)backend/src/bookbytes/core/exceptions.py(1 hunks)backend/src/bookbytes/core/logging.py(1 hunks)backend/src/bookbytes/dependencies.py(1 hunks)backend/src/bookbytes/main.py(1 hunks)backend/src/bookbytes/models/__init__.py(1 hunks)backend/src/bookbytes/models/audio_book.py(1 hunks)backend/src/bookbytes/models/base.py(1 hunks)backend/src/bookbytes/models/book_provider.py(1 hunks)backend/src/bookbytes/models/chapter.py(1 hunks)backend/src/bookbytes/models/edition.py(1 hunks)backend/src/bookbytes/models/work.py(1 hunks)backend/src/bookbytes/repositories/__init__.py(1 hunks)backend/src/bookbytes/repositories/base.py(1 hunks)backend/src/bookbytes/schemas/__init__.py(1 hunks)backend/src/bookbytes/schemas/common.py(1 hunks)backend/src/bookbytes/services/__init__.py(1 hunks)backend/src/bookbytes/storage/__init__.py(1 hunks)backend/src/bookbytes/workers/__init__.py(1 hunks)backend/tests/__init__.py(1 hunks)backend/tests/conftest.py(1 hunks)backend/tests/integration/__init__.py(1 hunks)backend/tests/integration/test_database.py(1 hunks)backend/tests/integration/test_health.py(1 hunks)backend/tests/mocks/__init__.py(1 hunks)backend/tests/mocks/openai_responses.py(1 hunks)backend/tests/mocks/openlibrary_responses.py(1 hunks)env.example(1 hunks)tasks/backend-productionisation/audio-books-library/design-doc.md(1 hunks)tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md(1 hunks)tasks/backend-productionisation/prd-backend-productionisation.md(1 hunks)tasks/backend-productionisation/tasks-backend-productionisation.md(1 hunks)tasks/backend-productionisation/technical-considerations-backend-productionisation.md(1 hunks)
💤 Files with no reviewable changes (1)
- .env.example
🧰 Additional context used
🧬 Code graph analysis (14)
backend/src/bookbytes/models/edition.py (4)
backend/src/bookbytes/models/base.py (4)
Base(24-30)TimestampMixin(55-77)UUIDPrimaryKeyMixin(33-52)id(47-52)backend/src/bookbytes/models/audio_book.py (1)
AudioBook(37-79)backend/src/bookbytes/models/book_provider.py (1)
BookProvider(36-100)backend/src/bookbytes/models/work.py (1)
Work(24-61)
backend/src/bookbytes/models/book_provider.py (3)
backend/src/bookbytes/models/base.py (3)
Base(24-30)TimestampMixin(55-77)UUIDPrimaryKeyMixin(33-52)backend/src/bookbytes/models/edition.py (1)
Edition(23-80)backend/src/bookbytes/models/work.py (1)
Work(24-61)
backend/src/bookbytes/models/audio_book.py (3)
backend/src/bookbytes/models/base.py (5)
Base(24-30)SoftDeleteMixin(80-111)TimestampMixin(55-77)UUIDPrimaryKeyMixin(33-52)id(47-52)backend/src/bookbytes/models/chapter.py (1)
Chapter(21-66)backend/src/bookbytes/models/edition.py (1)
Edition(23-80)
backend/src/bookbytes/dependencies.py (2)
backend/src/bookbytes/config.py (2)
Settings(53-254)get_settings(258-267)backend/src/bookbytes/core/database.py (1)
get_async_session(66-81)
backend/src/bookbytes/models/__init__.py (6)
backend/src/bookbytes/models/audio_book.py (2)
AudioBook(37-79)AudioBookStatus(28-34)backend/src/bookbytes/models/base.py (4)
Base(24-30)SoftDeleteMixin(80-111)TimestampMixin(55-77)UUIDPrimaryKeyMixin(33-52)backend/src/bookbytes/models/book_provider.py (2)
BookProvider(36-100)BookProviderType(28-33)backend/src/bookbytes/models/chapter.py (1)
Chapter(21-66)backend/src/bookbytes/models/edition.py (1)
Edition(23-80)backend/src/bookbytes/models/work.py (1)
Work(24-61)
backend/src/bookbytes/models/base.py (2)
backend/src/bookbytes/schemas/common.py (1)
TimestampMixin(171-175)backend/src/bookbytes/repositories/base.py (1)
restore(309-321)
backend/src/bookbytes/core/logging.py (1)
backend/src/bookbytes/config.py (3)
Settings(53-254)get_settings(258-267)use_json_logs(244-246)
backend/tests/integration/test_database.py (5)
backend/tests/conftest.py (1)
async_client(62-91)backend/src/bookbytes/config.py (1)
Settings(53-254)backend/src/bookbytes/core/database.py (3)
close_db(134-146)get_async_session(66-81)init_db(84-131)backend/src/bookbytes/models/base.py (2)
Base(24-30)TimestampMixin(55-77)backend/src/bookbytes/schemas/common.py (1)
TimestampMixin(171-175)
backend/src/bookbytes/repositories/base.py (2)
backend/src/bookbytes/models/base.py (4)
Base(24-30)id(47-52)deleted_at(90-96)restore(109-111)backend/src/bookbytes/schemas/common.py (3)
offset(112-114)limit(117-119)create(142-157)
backend/tests/integration/test_health.py (1)
backend/tests/conftest.py (1)
async_client(62-91)
backend/alembic/env.py (2)
backend/src/bookbytes/config.py (1)
get_settings(258-267)backend/src/bookbytes/models/base.py (1)
Base(24-30)
backend/src/bookbytes/schemas/common.py (2)
backend/src/bookbytes/repositories/base.py (1)
create(122-134)backend/src/bookbytes/models/base.py (4)
id(47-52)TimestampMixin(55-77)created_at(63-68)updated_at(71-77)
backend/src/bookbytes/core/database.py (2)
backend/src/bookbytes/config.py (1)
Settings(53-254)backend/src/bookbytes/core/logging.py (1)
get_logger(145-159)
backend/tests/conftest.py (5)
backend/src/bookbytes/config.py (1)
Settings(53-254)backend/src/bookbytes/core/database.py (2)
close_db(134-146)init_db(84-131)backend/src/bookbytes/core/logging.py (1)
configure_logging(67-142)backend/alembic/env.py (1)
get_url(35-38)backend/src/bookbytes/repositories/base.py (3)
delete(361-370)exists(192-206)exists(338-358)
🪛 checkmake (0.2.2)
Makefile
[warning] 14-14: Target body for "help" exceeds allowed length of 5 (35).
(maxbodylength)
[warning] 1-1: Missing required phony target "all"
(minphony)
[warning] 1-1: Missing required phony target "clean"
(minphony)
[warning] 1-1: Missing required phony target "test"
(minphony)
🪛 Checkov (3.2.334)
backend/docker/docker-compose.yml
[medium] 29-30: Basic Auth Credentials
(CKV_SECRET_4)
backend/docker/docker-compose.dev.yml
[medium] 17-18: Basic Auth Credentials
(CKV_SECRET_4)
🪛 LanguageTool
.agent/rules/pythonic-fastapi-async.md
[grammar] ~15-~15: Ensure spelling is correct
Context: ...de duplication. Favour composition over inheritence. Smart core, Thin interfaces. Use descr...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~36-~36: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
tasks/backend-productionisation/prd-backend-productionisation.md
[grammar] ~1-~1: Ensure spelling is correct
Context: # PRD: Backend Productionisation Document Version: 1.1 Created: 4...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~59-~59: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...diately | | US2 | As an API consumer, I want to poll job status, so I know when process...
(REP_WANT_TO_VB)
[style] ~68-~68: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...gres, Redis | | DS2 | As a developer, I want to run tests with mocked external services...
(REP_WANT_TO_VB)
[style] ~69-~69: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... | | DS3 | As a developer, I want to add new features without touching unrel...
(REP_WANT_TO_VB)
[uncategorized] ~188-~188: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...e | | NG8 | Rate limiting implementation | Design only; im...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
tasks/backend-productionisation/tasks-backend-productionisation.md
[grammar] ~1-~1: Ensure spelling is correct
Context: # Tasks: Backend Productionisation PRD Reference: [prd-backend-production...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~106-~106: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ker/docker-compose.yml up` to start the full stack - Environment variables are validated a...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-library/design-doc.md
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
332-332: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
360-360: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
397-397: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
584-584: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
tasks/backend-productionisation/technical-considerations-backend-productionisation.md
62-62: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
.github/copilot-instructions.md
12-12: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
tasks/backend-productionisation/tasks-backend-productionisation.md
227-227: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
234-234: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🔇 Additional comments (66)
backend/src/bookbytes/workers/__init__.py (1)
1-1: Looks good—simple and appropriate.This is a minimal package initializer with a clear docstring. Nothing to change here.
tasks/backend-productionisation/tasks-backend-productionisation.md (2)
1-120: Excellent task tracker structure and comprehensive roadmap.This document is well-organized and serves as a clear blueprint for backend productionization. The 7-phase structure is logical, task descriptions are actionable, and the practice of modularizing work (e.g., moving Phase 3 to a dedicated file) demonstrates good planning. Cross-references to related PRD and technical documentation are appropriate, and checkbox tracking enables progress visibility.
106-106: Use hyphen for compound adjective.Line 106 uses "full stack" as a compound adjective modifying "up"; this should be hyphenated for clarity and grammar correctness.
Apply this diff:
- Use `docker-compose -f docker/docker-compose.yml up` to start the full stack + Use `docker-compose -f docker/docker-compose.yml up` to start the full-stackLikely an incorrect or invalid review comment.
.dockerignore (1)
57-59: Defensive dual-case README inclusion is acceptable.Both
README.mdandReadme.mdare included to handle potential case-sensitivity across different systems. Sincepyproject.tomlspecifiesreadme = "Readme.md", the lowercase variant is necessary. No changes needed.backend/pyproject.toml (3)
115-120: Ruff configuration is pragmatic and well-reasoned.The selection of rules and ignores is sensible for a FastAPI project:
- Comprehensive error/warning coverage (E, W, F, I)
- Code quality checks (C4, B, UP, SIM)
- Intentional ignores: E501 for line length (handled by formatter), B008 for FastAPI's
Depends(), ARG001 for common dependency patterns
125-134: Pytest configuration is solid for async-first testing.
asyncio_mode = "auto"is appropriate for pytest-asyncio with FastAPI, and test discovery patterns are standard. No concerns here.
144-150: mypy overrides for untyped packages are reasonable.Excluding gtts, arq, and aioboto3 is a pragmatic choice given these packages lack complete type hints. The strict mode on the main codebase maintains type safety where it matters.
backend/docker/Dockerfile (1)
36-62: Excellent security and multi-stage practices.The Dockerfile follows container security best practices:
- Non-root user (bookbytes, UID/GID 1000) prevents privilege escalation
- Lean runtime image via multi-stage build
- Proper ownership of application directories
- HEALTHCHECK provides Docker daemon visibility into app readiness
- Environment variables well-structured
tasks/backend-productionisation/audio-books-library/design-doc.md (1)
1-890: Design document is comprehensive and technically sound.The document provides thorough guidance on:
- OpenLibrary API integration with rate limit awareness
- Provider-agnostic architecture via BookProvider table (excellent pattern for multi-provider support)
- Pragmatic cache strategy (Redis-only with AOF persistence, stale-while-revalidate)
- Clear data model relationships and migrations strategy
- Concrete API endpoint definitions
- Phased implementation roadmap with success metrics
The architecture decisions are well-reasoned, particularly:
- Provider-agnostic models — decouples core data from external APIs
- BookProvider mapping table — enables multi-provider support without schema migration
- Redis-only caching — simplifies architecture while maintaining sufficient durability via AOF
- Stale-while-revalidate pattern — balances freshness and performance
Only minor markdown linting issues to address; content is production-ready.
backend/docker/docker-compose.yml (2)
29-40: Development credentials are acceptable, but ensure production secrets are managed externally.The hardcoded database credentials (
bookbytes:bookbytes) and dev-mode secrets (dev-api-key-12345,dev-secret-key-change-in-production) are fine for local development. The naming convention clearly signals these must be replaced in production.For production deployments, consider:
- Using Docker secrets or an external secrets manager
- Documenting the required environment variables in a deployment guide
104-137: PostgreSQL and Redis configurations are well-structured.Health checks, persistence via named volumes, and restart policies are properly configured. The
pg_isreadyandredis-cli pinghealth checks are appropriate for dependency orchestration.tasks/backend-productionisation/prd-backend-productionisation.md (1)
1-285: Comprehensive PRD with clear structure and requirements.The document effectively outlines the backend productionization strategy with well-defined goals, user stories, and phased implementation. The technology choices (FastAPI, SQLAlchemy 2.0 async, ARQ, structlog) align with modern Python async best practices.
.github/copilot-instructions.md (1)
28-79: Well-documented coding conventions and patterns.The instructions clearly document structlog usage, exception patterns, configuration access, and testing conventions. This provides excellent guidance for maintaining consistency across the codebase.
backend/src/bookbytes/models/base.py (1)
33-78: Well-designed mixins using SQLAlchemy 2.0 patterns.The UUIDv7 choice provides excellent B-tree index performance with time-sortable, distributed-safe identifiers. The
TimestampMixincorrectly uses timezone-awareDateTimewith server-side defaults.backend/tests/mocks/openlibrary_responses.py (1)
127-160: Helper functions are well-designed for test flexibility.Supporting both ISBN-10 and ISBN-13 lookups for the same books enables realistic test scenarios. The return type distinction (
dictvsdict | None) appropriately differentiates between API responses and parsed metadata.backend/src/bookbytes/repositories/base.py (2)
136-149: Sequential refresh increate_manymay impact bulk performance.The loop at lines 147-148 refreshes entities one-by-one after bulk insert. For large batches, this adds N round-trips to the database.
If bulk inserts with generated fields become a performance concern, consider:
- Using
RETURNINGclause in bulk operations (PostgreSQL-specific)- Deferring refresh until entities are actually accessed
For now, this is acceptable as correctness is prioritized.
280-336: Solid soft-delete implementation with proper state checks.The
restore_by_idcorrectly fetches withinclude_deleted=Trueand validates the entity was actually deleted before restoring (line 333). This prevents spurious operations on non-deleted entities.backend/src/bookbytes/core/exceptions.py (7)
1-15: LGTM! Clear documentation and minimal imports.The module docstring provides excellent context for the exception hierarchy's purpose, and the import is appropriate for the type hints used throughout.
17-72: Excellent base exception design.The
BookBytesErrorbase class provides a clean foundation with:
- Class-level defaults for code, message, and status_code
- Flexible constructor for overrides
- Structured
to_dict()serialization for API responsesThe wrapping structure
{"error": {...}}into_dict()appears intentional for consistent API error responses.
79-201: Well-structured NotFound error hierarchy.The five
NotFoundErrorsubclasses follow a consistent and clean pattern:
- Accept optional identifiers for context
- Build structured
detailspayloads- Enhance messages with identifier information when available
The pattern ensures error responses are both human-readable and machine-parseable.
208-236: Correct HTTP semantics for auth errors.The authentication/authorization error hierarchy properly distinguishes:
- 401 (AuthenticationError) for authentication failures
- 403 (AuthorizationError) for permission/authorization failures
Simple subclasses appropriately rely on class-level defaults without unnecessary constructor overrides.
243-283: Good validation error design with field context.The
ValidationErrorbase class adds afieldparameter to help identify which field failed validation.InvalidISBNErrorcorrectly leverages this by passingfield="isbn"to the parent, ensuring structured error responses include field context.
290-324: Appropriate external service error handling.Using HTTP 502 (Bad Gateway) for external service failures is correct. The four specific subclasses provide clear error codes for different service types (OpenAI, TTS, Metadata, Storage), enabling targeted error handling and monitoring.
331-391: Solid job error handling with useful context.The job error hierarchy uses appropriate HTTP status codes:
- 409 (Conflict) for duplicate job creation attempts
- 500 (Internal Server Error) for processing failures
Both exceptions capture relevant context (job IDs, failed steps, errors) that will be valuable for debugging and monitoring job pipelines.
backend/alembic.ini (1)
1-146: Alembic configuration looks consistent with the project layout.
script_locationand logging sections match thebackend/alembic/structure and async env.py approach; nothing blocking here.backend/alembic/script.py.mako (1)
1-28: Template matches modern Alembic patterns; just be mindful of injected code indentation.The revision metadata and
upgrade/downgradescaffolding are standard and type-annotated nicely. When generating migrations, ensure${upgrades}/${downgrades}are rendered with appropriate leading indentation so multi-line operations align under the function body, but otherwise this looks good.backend/alembic/versions/001_pg_idkit_extension.py (1)
1-34: Migration is straightforward; ensure operational readiness forpg_idkit.The
CREATE EXTENSION IF NOT EXISTS pg_idkit/DROP EXTENSION IF EXISTS pg_idkitpair is clean and idempotent. The main remaining concern is operational:
- Confirm that your Postgres images/instances (including managed services) have
pg_idkitavailable and that the migration user has permission to install/drop extensions, or document an alternative path for those environments.backend/tests/mocks/openai_responses.py (1)
12-129: Mocks are comprehensive and well-structured for testing.The module nicely covers:
- Raw chat completion responses + parsed structures for chapter extraction.
- A realistic summary response and per-chapter
SAMPLE_SUMMARIES.- Common error shapes (rate limit, invalid key, context length) for negative-path tests.
This should make your OpenAI integration tests deterministic and expressive.
backend/src/bookbytes/models/book_provider.py (2)
28-34: LGTM!The
BookProviderTypeenum follows good practices: inherits fromstr, Enumfor JSON serialization compatibility, uses lowercase values, and includes a forward-looking comment for future providers.
78-87: LGTM!The relationships are correctly defined with explicit
foreign_keysto disambiguate the multiple FKs to the same tables. Theback_populatesreferences align with the relationship definitions inWorkandEditionmodels shown in the relevant snippets.backend/tests/integration/test_database.py (2)
7-16: LGTM!Good integration test verifying the
/health/readyendpoint includes database connectivity checks. The assertions cover the essential response structure.
55-66: LGTM!Simple but effective smoke test validating that the base classes and mixins expose the expected attributes. This catches import errors and basic structural issues early.
backend/tests/integration/test_health.py (3)
7-25: LGTM!Good coverage of both liveness and readiness probes with appropriate assertions. Note:
test_readiness_probehas some overlap withtest_health_ready_checks_databaseintest_database.py, but they test different aspects (general structure vs. database-specific checks), so both are valuable.
28-38: LGTM!Comprehensive test for the root endpoint, validating service metadata and navigation links. The assertions ensure API discoverability.
41-59: LGTM!Good observability tests. Testing both auto-generated and custom request ID propagation is important for distributed tracing and debugging in production.
backend/src/bookbytes/models/chapter.py (2)
21-60: LGTM!Well-designed Chapter model with:
- Appropriate FK with cascade delete and index for parent lookups
UniqueConstraintensuring chapter numbers are unique within an audiobookTexttype for summary (unbounded length)- Optional audio-related fields allowing chapters to exist before audio generation
The 1-indexed
chapter_numberconvention is documented and the constraint naming is clear.
62-66: LGTM!The
__repr__provides useful debugging information including the parent reference and identifying fields.backend/src/bookbytes/models/__init__.py (2)
1-37: LGTM!Clean package initialization with:
- Explicit
__all__for public API surface- Organized imports with logical grouping
- Helpful comments indicating phase/roadmap context
16-16: TheEdition.book_providersrelationship configuration is correct.The current code properly sets
back_populates="edition"in theEdition.book_providersrelationship (line 74), which correctly references theeditionattribute on theBookProvidermodel (line 83-87). The bidirectional relationship is properly configured.backend/src/bookbytes/models/audio_book.py (3)
28-35: LGTM! Clean enum definition.The
AudioBookStatusenum inherits from bothstrandEnum, ensuring string serialization compatibility. The status values cover the complete audiobook generation lifecycle.
37-73: Consider adding a unique constraint onedition_idfor one-to-one enforcement.The docstring and relevant snippets indicate this is a one-to-one relationship with
Edition(Edition hasaudio_booksingular, notaudio_books). However, the current schema allows multipleAudioBookrows peredition_id. If truly one-to-one, add a unique constraint:edition_id: Mapped[UUID] = mapped_column( ForeignKey("editions.id", ondelete="CASCADE"), nullable=False, index=True, + unique=True, )If multiple audiobooks per edition is intentional (e.g., different versions), then the relationship name in
Editionshould be pluralized and the current setup is fine—please clarify the intent.
75-79: LGTM! Useful__repr__for debugging.The repr includes essential fields (id, edition_id, status, version) for effective debugging.
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (2)
1-364: Well-structured task document with clear phase organization.The phased approach with clear task decomposition, relevant file references, and notes section provides good project guidance. The decision to use Redis-only caching (removing api_cache.py) is clearly documented.
86-88: Task 3.0.4 is administrative cleanup—UUIDv7 is already implemented and operational.UUIDv7 is already integrated into the codebase (backend/src/bookbytes/models/base.py) with the pg_idkit PostgreSQL extension installed via Alembic. The implementation is documented as time-sortable and is actively used by models. While explicit PostgreSQL sorting tests don't exist in test_database.py, the feature is already deployed. Mark this task complete or add a dedicated PostgreSQL sorting test if stricter verification is needed, but this is not a blocker for dependent features.
tasks/backend-productionisation/technical-considerations-backend-productionisation.md (2)
156-251: Database schema is well-designed with appropriate indexes and constraints.Good use of:
- UUID primary keys with
gen_random_uuid()- Proper foreign key relationships with
ON DELETE CASCADE/SET NULL- Indexes on frequently queried columns (email, isbn, book_id, status)
- Check constraints (progress 0-100)
- Unique constraints where appropriate
380-387: Breaking changes are clearly documented.The migration notes properly call out breaking changes including API prefix changes, UUID-based book IDs, async processing, and mandatory authentication. This helps API consumers prepare for the migration.
backend/src/bookbytes/dependencies.py (3)
24-36: LGTM! Useful for accessing potentially modified settings.This pattern correctly retrieves settings from request state, which may have been modified during app startup lifespan.
67-82: Stub implementations are acceptable for phased development.The
# type: ignorecomments andNonereturns for unimplemented dependencies are appropriate placeholders. The TODO comments with phase references provide clear traceability.Also applies to: 88-102, 109-147, 153-178
42-61: Double commit/rollback:get_async_sessionalready handles transactions.Per the relevant snippet from
database.py,get_async_session()already commits on success and rolls back on exception. Wrapping it again here causes:
- Double commit on success
- Potential issues if the inner generator already committed before outer tries to commit
Simplify to just yield the session:
async def get_db_session() -> AsyncGenerator[AsyncSession, None]: - """Get an async database session. - - Yields a database session that automatically handles - commit on success and rollback on exception. - - Yields: - AsyncSession: Database session - """ - + """Get an async database session.""" from bookbytes.core.database import get_async_session async for session in get_async_session(): - try: - yield session - await session.commit() - except Exception: - await session.rollback() - raise + yield sessionOr, if you want transaction control at the dependency level, remove it from
get_async_session()and keep it only here.Likely an incorrect or invalid review comment.
backend/src/bookbytes/models/work.py (3)
41-41: Verifydefault=listbehavior with SQLAlchemy.Using
default=list(a callable) is correct and safe here—SQLAlchemy will calllist()for each new instance, avoiding the mutable default argument pitfall. Good practice.
47-58: Relationship configurations are well-designed.
lazy="selectin"oneditionsprevents N+1 queries by batch-loading related editionscascade="all, delete-orphan"ensures proper cleanup when a Work is deleted- Explicit
foreign_keysonbook_providersresolves potential ambiguity in polymorphic relationships
24-61: LGTM! Clean, well-documented model.The Work model follows established patterns from the base mixins, has appropriate field definitions, and clear docstrings explaining the domain concept. The provider-agnostic design via separate
BookProvidertable is a good architectural choice.backend/tests/conftest.py (2)
28-47: LGTM! Well-structured test settings fixture.The test settings appropriately override production defaults with test-specific values including in-memory SQLite, test API key, and console logging format.
147-218: LGTM! Well-designed mock service fixtures.The mock fixtures provide sensible default return values and use
AsyncMockappropriately for async methods. This enables isolated unit testing without external dependencies.backend/alembic/env.py (2)
35-38: LGTM! Clean database URL resolution.Properly delegates to the centralized settings configuration.
72-90: LGTM! Standard async Alembic pattern.The async migration runner correctly creates an engine with
NullPool(appropriate for one-shot migration scripts), runs migrations synchronously within the async connection context, and properly disposes the engine afterward.backend/src/bookbytes/core/logging.py (2)
31-46: LGTM! Clean correlation ID management.The
ContextVar-based approach ensures proper isolation of correlation IDs across async request contexts.
145-159: LGTM! Simple logger factory.Properly wraps structlog's
get_loggerwith clear documentation.backend/src/bookbytes/main.py (2)
156-210: LGTM! Well-implemented request logging middleware.The middleware properly generates/propagates correlation IDs, logs request lifecycle with timing, handles exceptions, and cleans up the context variable in
finally.
301-319: LGTM! Readiness probe with dependency checks.The readiness endpoint correctly checks database connectivity and returns structured health status. The TODO for Redis is appropriately noted.
backend/src/bookbytes/config.py (2)
14-51: LGTM! Well-defined configuration enums.The enums provide type safety and clear documentation for configuration options.
257-267: LGTM! Properly cached settings factory.The
@lru_cachedecorator ensures settings are only read once from environment variables, improving performance for repeated access.backend/src/bookbytes/core/database.py (3)
39-63: Engine and session factory accessors are clear and safeThe global accessors correctly guard against use before initialization with explicit
RuntimeErrors, which is helpful for catching miswired startup. No changes needed here.
66-82: Session dependency pattern looks goodThe
get_async_sessionimplementation matches the recommended FastAPI/SQLAlchemy async pattern: one session per request, automatic close via the context manager, and rollback on exceptions. This is solid as-is.
134-147: Shutdown path is idempotent and correct
close_dbsafely disposes the engine, clears globals, and logs both before and after, while being a no-op if the engine was never initialized. This is straightforward and robust.backend/src/bookbytes/schemas/common.py (1)
232-242: MessageResponse is simple and sufficientThe
MessageResponsewrapper with example metadata is straightforward and aligns with common “ack”/confirmation response patterns. Looks good as-is.
| Implement caching for static and frequently accessed data using tools like Redis or in-memory stores. | ||
| Optimize data serialization and deserialization with Pydantic. | ||
| Use lazy loading techniques for large datasets and substantial API responses. | ||
| Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SQLAlchemy typo.
Line 36 contains a spelling error: "SQLAlachemy" should be "SQLAlchemy".
Apply this correction:
- Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
+ Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices. | |
| Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices. |
🧰 Tools
🪛 LanguageTool
[grammar] ~36-~36: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In .agent/rules/pythonic-fastapi-async.md around line 36, fix the typo
"SQLAlachemy" to the correct spelling "SQLAlchemy" so the sentence reads
referencing FastAPI, Pydantic, SQLAlchemy, other library documentation for Data
Models, Schemas, Path Operations, Middleware and best practices.
| version = "0.1.0" | ||
| description = "AI-powered book summarization and audio generation service" | ||
| readme = "Readme.md" | ||
| requires-python = ">=3.13" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd backend && head -150 pyproject.toml | cat -nRepository: cryptus-neoxys/bookbytes
Length of output: 4280
Fix Python version mismatch between project requirement and tooling config.
The project explicitly requires Python >=3.13 (line 10) and lists Python 3.13 in classifiers (line 22), but Ruff (line 99) and mypy (line 137) are configured for Python 3.11. This creates a discrepancy where the linters may not properly validate code written for Python 3.13, including syntax or feature availability differences between these versions.
Update the tooling targets to match:
-target-version = "py311"
+target-version = "py313"
-python_version = "3.11"
+python_version = "3.13"Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/pyproject.toml around lines 10, 99, and 137 update tooling to match
the declared requires-python >=3.13: change any ruff target-version value to the
Python 3.13 token (e.g., "py313"), set mypy's python_version to "3.13" (or
equivalent toml value), and ensure classifiers (around line 22) list Python ::
3.13 so all project metadata and tooling targets are consistent.
| s3_url_expiry_seconds: int = Field( | ||
| default=0, | ||
| ge=0, | ||
| description="S3 pre-signed URL expiry in seconds (0 = no expiry/public)", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
AWS S3 presigned URL expiry 0 seconds behavior boto3
💡 Result:
Short answer: ExpiresIn=0 makes the presigned URL effectively expired immediately (or set to the current time) and thus unusable. In practice very small values (including 0) will fail due to immediate expiration, clock skew, or because the URL is also bounded by the lifetime of the credentials used to sign it (temporary STS/role credentials can expire much sooner). Use at least a 1+ second expiry and ensure the signing credentials have a longer lifetime and system clocks are synced. [1][2][3][4]
Sources:
- Boto3 generate_presigned_url docs (ExpiresIn is seconds). [1]
- AWS S3 presigned-URL doc (URL valid for specified period; also limited by credential lifetime). [2]
- Community answers showing premature expiry caused by temporary credentials and recommending non-temporary IAM credentials for longer windows. [3]
- Reports of very small expirations failing (clock skew / immediate expiry). [4]
The S3 pre-signed URL expiry field description is semantically incorrect.
AWS S3 pre-signed URLs with ExpiresIn=0 become immediately expired and unusable, not "public" or without expiry. The field description "0 = no expiry/public" is misleading and does not reflect AWS behavior. Either enforce a minimum positive value (e.g., gt=0) or update the description to accurately document that 0 results in an immediately expired URL.
🤖 Prompt for AI Agents
In backend/src/bookbytes/config.py around lines 168 to 172 the Field for
s3_url_expiry_seconds has an incorrect description ("0 = no expiry/public") that
contradicts AWS behavior; update the code to either enforce a positive expiry or
document 0 correctly: either change the Field validator to ge>0 (e.g., gt=0) and
update the description to something like "S3 pre-signed URL expiry in seconds
(must be >0)" or keep ge=0 but change the description to "S3 pre-signed URL
expiry in seconds (0 = immediately expired/unusable); use a positive value for a
valid expiry"; pick one approach and make the corresponding change to the Field
constraints and description so the behavior is accurately represented.
| @field_validator("jwt_secret_key") | ||
| @classmethod | ||
| def validate_jwt_secret(cls, v: SecretStr) -> SecretStr: | ||
| """Warn if using default JWT secret in non-dev environment.""" | ||
| # This is a soft validation - we log a warning but don't fail | ||
| # The actual check happens at runtime in production | ||
| return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No-op validator with misleading docstring.
The validate_jwt_secret validator claims to "warn if using default JWT secret" but performs no validation or logging—it just returns the value unchanged.
Either implement the warning or remove the misleading docstring:
@field_validator("jwt_secret_key")
@classmethod
- def validate_jwt_secret(cls, v: SecretStr) -> SecretStr:
- """Warn if using default JWT secret in non-dev environment."""
- # This is a soft validation - we log a warning but don't fail
- # The actual check happens at runtime in production
+ def validate_jwt_secret(cls, v: SecretStr, info: ValidationInfo) -> SecretStr:
+ """Warn if using default JWT secret."""
+ import warnings
+ default_secret = "dev-secret-key-change-in-production-!!!"
+ if v.get_secret_value() == default_secret:
+ warnings.warn(
+ "Using default JWT secret key. Set JWT_SECRET_KEY env var in production.",
+ UserWarning,
+ stacklevel=2,
+ )
return vNote: You'll need to import ValidationInfo from pydantic.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/src/bookbytes/config.py around lines 248 to 254, the validator
validate_jwt_secret is currently a no-op despite its docstring; update it to
accept a second parameter info: ValidationInfo (import ValidationInfo from
pydantic) and implement logic to detect the default secret (compare
v.get_secret_value() to your DEFAULT_JWT_SECRET or hardcoded default), and if
running in a non-development environment (check an env var like ENV or
settings), call logger.warning with a clear message including context (don't
raise); otherwise simply return v as SecretStr. If you prefer not to warn,
remove/adjust the docstring and the validator decorator accordingly so it isn’t
misleading.
| @pytest.fixture | ||
| async def async_client( | ||
| app: FastAPI, test_settings: Settings | ||
| ) -> AsyncGenerator[AsyncClient, None]: | ||
| """Create an async HTTP client for testing. | ||
|
|
||
| This client makes requests to the test app without starting a server. | ||
| The app lifespan is triggered to initialize database and other resources. | ||
|
|
||
| Usage: | ||
| async def test_endpoint(async_client: AsyncClient): | ||
| response = await async_client.get("/health/live") | ||
| assert response.status_code == 200 | ||
| """ | ||
| from contextlib import asynccontextmanager | ||
|
|
||
| from bookbytes.core.database import close_db, init_db | ||
| from bookbytes.core.logging import configure_logging | ||
|
|
||
| # Initialize resources that lifespan would normally handle | ||
| configure_logging(test_settings) | ||
| await init_db(test_settings) | ||
|
|
||
| async with AsyncClient( | ||
| transport=ASGITransport(app=app), | ||
| base_url="http://test", | ||
| ) as client: | ||
| yield client | ||
|
|
||
| # Cleanup | ||
| await close_db() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing table creation for in-memory SQLite.
The async_client fixture initializes the database engine but doesn't create the schema tables. In-memory SQLite databases start empty, so tests requiring database access will fail with "table not found" errors.
from bookbytes.core.database import close_db, init_db
from bookbytes.core.logging import configure_logging
+ from bookbytes.models import Base
+ from bookbytes.core.database import get_engine
# Initialize resources that lifespan would normally handle
configure_logging(test_settings)
await init_db(test_settings)
+ # Create tables for in-memory SQLite
+ engine = get_engine()
+ async with engine.begin() as conn:
+ await conn.run_sync(Base.metadata.create_all)
+
async with AsyncClient(| @pytest.fixture | ||
| async def authenticated_client( | ||
| app: FastAPI, test_settings: Settings | ||
| ) -> AsyncGenerator[AsyncClient, None]: | ||
| """Create an authenticated async client using API key. | ||
|
|
||
| This client automatically includes the X-API-Key header. | ||
|
|
||
| Usage: | ||
| async def test_protected_endpoint(authenticated_client: AsyncClient): | ||
| response = await authenticated_client.get("/api/v1/books") | ||
| assert response.status_code == 200 | ||
| """ | ||
| async with AsyncClient( | ||
| transport=ASGITransport(app=app), | ||
| base_url="http://test", | ||
| headers={"X-API-Key": test_settings.api_key.get_secret_value()}, | ||
| ) as client: | ||
| yield client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
authenticated_client skips database initialization.
Unlike async_client, this fixture doesn't call init_db() or configure_logging(). Tests using authenticated_client will fail if they hit endpoints that require database access.
Consider either making authenticated_client depend on async_client or duplicating the initialization logic.
@pytest.fixture
async def authenticated_client(
- app: FastAPI, test_settings: Settings
+ async_client: AsyncClient, test_settings: Settings, app: FastAPI
) -> AsyncGenerator[AsyncClient, None]:
- """Create an authenticated async client using API key.
-
- This client automatically includes the X-API-Key header.
-
- Usage:
- async def test_protected_endpoint(authenticated_client: AsyncClient):
- response = await authenticated_client.get("/api/v1/books")
- assert response.status_code == 200
- """
+ """Create an authenticated async client using API key."""
+ # Reuse async_client's initialized resources
async with AsyncClient(
transport=ASGITransport(app=app),
base_url="http://test",
headers={"X-API-Key": test_settings.api_key.get_secret_value()},
) as client:
yield clientCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/tests/conftest.py around lines 94 to 112, the authenticated_client
fixture creates an AsyncClient with an API key header but does not perform
database initialization or logging configuration like async_client does, causing
DB-dependent tests to fail; modify authenticated_client to reuse the
async_client fixture (add async_client: AsyncClient to its signature) and set
the X-API-Key header on that provided client before yielding, or alternatively
call the same init_db() and configure_logging() steps before creating the
AsyncClient so the database and logging are initialized the same way as
async_client.
| | ----- | ---------------------------------------------------------------------------- | --- | | ||
| | FR5.1 | Create abstract `StorageBackend` interface in `storage/base.py` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor table formatting issue.
There's a stray pipe character causing malformed table syntax:
-| FR5.4 | Add storage backend selection via config: `STORAGE_BACKEND=local | s3` |
+| FR5.4 | Add storage backend selection via config: `STORAGE_BACKEND=local \| s3` |The pipe inside the cell content needs escaping or the cell structure should be adjusted.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In tasks/backend-productionisation/prd-backend-productionisation.md around lines
137 to 138, the table row for FR5.1 contains an unescaped pipe character inside
the second cell which breaks the Markdown table; remove or escape the pipe
(e.g., replace with \|) or restructure the cell content (wrap text differently
or split into additional columns) so the table syntax is valid and the row
renders correctly.
| - [ ] 7.0 **Phase 7: Observability & Deployment** | ||
| - [x] 7.1 Create `src/bookbytes/core/logging.py` with structlog configuration: Configure `structlog.configure()` with processors for JSON output (prod) or console (dev), add timestamp, log level, logger name _(DONE - moved to auxiliary foundation)_ | ||
| - [x] 7.2 Add correlation ID processor in logging: Extract `request_id` from context, add to all log entries _(DONE - moved to auxiliary foundation)_ | ||
| - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix markdown formatting: remove spaces inside emphasis markers.
Lines 227 and 234 have spaces around the emphasis markers that should be removed per markdown lint standards.
Apply this diff:
- - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_
+ - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)*- - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)\_
+ - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)*Also applies to: 234-234
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
227-227: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🤖 Prompt for AI Agents
In tasks/backend-productionisation/tasks-backend-productionisation.md around
lines 227 and 234, the markdown uses emphasis markers with spaces inside (e.g.
`*(DONE - moved to auxiliary foundation)\_` and similar); remove the spaces
between the emphasis markers and the text so the emphasis/italic syntax is
correct (e.g. change `*(...)\_` to `*(...)\_` without internal spaces or use
`_text_`/`*text*` properly), ensuring no extra spaces between `*` or `_` and the
emphasized text on both lines.
| - AUTH_MODE=api_key | ||
| - API_KEY=dev-api-key-12345 | ||
| - JWT_SECRET_KEY=${JWT_SECRET_KEY:-dev-secret-key-change-in-prod} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid default fallback for JWT_SECRET_KEY in Docker Compose.
The fallback dev-secret-key-change-in-prod creates a security risk—if the environment variable is unset, the application silently uses a weak, predictable secret. Consider failing fast instead:
- - JWT_SECRET_KEY=${JWT_SECRET_KEY:-dev-secret-key-change-in-prod}
+ - JWT_SECRET_KEY=${JWT_SECRET_KEY:?JWT_SECRET_KEY must be set}Alternatively, document clearly that this default is only for local development and must never be used in production.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - AUTH_MODE=api_key | |
| - API_KEY=dev-api-key-12345 | |
| - JWT_SECRET_KEY=${JWT_SECRET_KEY:-dev-secret-key-change-in-prod} | |
| - AUTH_MODE=api_key | |
| - API_KEY=dev-api-key-12345 | |
| - JWT_SECRET_KEY=${JWT_SECRET_KEY:?JWT_SECRET_KEY must be set} |
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/technical-considerations-backend-productionisation.md
around lines 410 to 412, the Docker Compose snippet provides a default fallback
JWT_SECRET_KEY ("dev-secret-key-change-in-prod") which is insecure; remove the
default fallback so the service fails fast when JWT_SECRET_KEY is not set (or
replace it with an explicit placeholder and a clear comment requiring the env be
provided in production), and update the text to state that a non-empty
JWT_SECRET_KEY is mandatory in production and must not rely on any development
default.
…onical models - Add OpenLibrary API configuration to settings - Implement OpenLibraryService with search and work details methods - Define canonical DTOs for provider-agnostic data representation - Integrate with CacheService for response caching - Add FastAPI dependency injection support - Update task documentation to reflect completion
Add research document outlining canonical data model and anti-corruption layer patterns for future multi-provider integration phase in BookBytes Audio Books Library. Includes cache strategy and trade-offs analysis.
… services test(cache): add unit tests for cache key generation, TTL handling, and operations test(openlibrary): add unit tests for search, work details, and error handling test(integration): add integration tests for openlibrary API with real network calls docs: update test markers in pyproject.toml for better test organization
Add API v1 router with search endpoints for books, works and ISBN lookup Include schemas for search requests/responses and unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (4)
backend/pyproject.toml (1)
99-99: Fix Python version mismatch between project requirement and tooling config.The project explicitly requires Python >=3.13 (line 10) and lists Python 3.13 in classifiers (line 22), but Ruff (line 99) and mypy (line 142) are configured for Python 3.11. This creates a discrepancy where the linters may not properly validate code written for Python 3.13, including syntax or feature availability differences between these versions.
Update the tooling targets to match:
-target-version = "py311" +target-version = "py313" -python_version = "3.11" +python_version = "3.13"Also applies to: 142-142
backend/src/bookbytes/config.py (2)
151-172: Correct S3 pre-signed URL expiry semantics.
description="S3 pre-signed URL expiry in seconds (0 = no expiry/public)"is incorrect for AWS S3: an expiry of0makes the URL effectively immediately expired, not public or non-expiring. Either:
- Enforce positive expiry and update description, e.g.:
- s3_url_expiry_seconds: int = Field( - default=0, - ge=0, - description="S3 pre-signed URL expiry in seconds (0 = no expiry/public)", - ) + s3_url_expiry_seconds: int = Field( + default=3600, + ge=1, + description="S3 pre-signed URL expiry in seconds (must be > 0)", + )or keep
ge=0but document that0means “immediately expired/unusable” and ensure calling code treats0as a special case (e.g., don’t generate a pre-signed URL at all).
265-271:validate_jwt_secretis still a no-op despite the docstring.The validator claims to “Warn if using default JWT secret in non-dev environment” but currently just returns
v. Either implement the warning (e.g., compare to the default and log/emit a warning whenapp_env != DEVELOPMENT) or remove/relax the docstring and validator to avoid implying checks that don’t exist..agent/rules/pythonic-fastapi-async.md (1)
14-37: Fix typos in FastAPI/Python guidance.A few spelling issues to clean up:
- Line 16:
inheritence→inheritance- Line 37:
SQLAlachemy→SQLAlchemy- Line 37:
Shemas→Schemas- Line 26:
basemodel→BaseModelFor example:
- Favour composition over inheritence. Smart core, Thin interfaces. + Favour composition over inheritance. Smart core, Thin interfaces.- Use functional components (plain functions) and Pydantic models/basemodel for consistent input/output validation and response schemas. + Use functional components (plain functions) and Pydantic models/BaseModel for consistent input/output validation and response schemas.- Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices. + Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.
🧹 Nitpick comments (10)
backend/pyproject.toml (1)
26-72: Consider pinning dependency versions for reproducibility.All dependencies are unversioned (using latest available), which can lead to non-reproducible builds if breaking changes are introduced in transitive dependencies. For a production service, consider pinning at least major versions or using a lock file strategy (e.g.,
pip-compileor Poetry's lock file).Example:
- "fastapi", - "uvicorn[standard]", + "fastapi>=0.104.0", + "uvicorn[standard]>=0.24.0",This is especially important for security-critical packages like
passlib,python-jose, and database drivers.backend/src/bookbytes/repositories/work.py (1)
30-36: Clarify author matching semantics inget_by_title_author.The comment says "author overlap", but
Work.authors.contains(authors)requires the stored JSON array to contain the fullauthorslist (subset match), not just any overlap. If you intend “any matching author”, consider either:
- Changing the query (e.g.,
or_over individual author contains checks), or- Updating the docstring/comment to reflect the stricter subset requirement.
backend/alembic/versions/80de24a1ee9e_add_audio_books_library_models.py (1)
14-18: Verifydown_revisionand modernize type hints.
down_revision = Nonemakes this the base migration. If you already have earlier Alembic revisions, this will create a second head; updatedown_revisionto the previous revision ID instead.- Minor style nit: you can follow current Python/ruff guidance by importing
Sequencefromcollections.abcand using|unions:-from typing import Sequence, Union +from collections.abc import Sequence + +from typing import Union # or drop entirely if you switch to `|` unions -down_revision: Union[str, Sequence[str], None] = None -branch_labels: Union[str, Sequence[str], None] = None -depends_on: Union[str, Sequence[str], None] = None +down_revision: str | Sequence[str] | None = None +branch_labels: str | Sequence[str] | None = None +depends_on: str | Sequence[str] | None = Nonebackend/tests/unit/test_cache_service.py (1)
19-28: Verifyscan_itermock type matches async usage ininvalidate_pattern.The fixture mocks
scan_iteras a synchronous iterator (iter([])), but intest_invalidate_pattern(lines 263-267), it's replaced with an async generator. This default mock would cause issues if any test relies on it directly without overriding.Consider making the default mock an async iterator for consistency:
@pytest.fixture def mock_redis() -> MagicMock: """Create a mock Redis client.""" redis = MagicMock() redis.get = AsyncMock(return_value=None) redis.setex = AsyncMock(return_value=True) redis.delete = AsyncMock(return_value=1) redis.ttl = AsyncMock(return_value=3600) - redis.scan_iter = MagicMock(return_value=iter([])) + async def empty_scan_iter(match: str): + return + yield # Makes this an async generator + redis.scan_iter = empty_scan_iter return redisbackend/tests/unit/test_openlibrary_service.py (1)
170-180: Consider initializing results directly in the constructor.The mutation of
response.resultsafter construction (line 179) works but is less idiomatic. Consider initializing directly:def test_has_more_true(self) -> None: """Test has_more when more results exist.""" response = SearchResponse( - results=[], + results=[MagicMock()] * 10, total_found=100, offset=0, limit=10, ) - # At offset 0 with 10 results, 100 total -> has more - response.results = [MagicMock()] * 10 assert response.has_more is Truebackend/src/bookbytes/repositories/book_provider.py (1)
71-98: Callers should handleIntegrityErrorfor duplicate provider keys.The
UniqueConstrainton(provider, external_key)will raiseIntegrityErrorif the mapping already exists. Consider either:
- Documenting this in the docstring
- Using
provider_key_exists()check before calling, or- Handling the exception at the service layer
This is a design choice, so just flagging for awareness.
backend/tests/integration/test_openlibrary_api.py (1)
283-295:test_cache_key_is_deterministicdoesn't require real API and should be a unit test.This test only exercises
CacheService.search_key()static method and doesn't interact with the OpenLibrary API at all. Consider moving it to unit tests:-@pytest.mark.integration -@pytest.mark.external -class TestOpenLibraryCacheIntegration: +# In unit tests file +class TestCacheServiceKeys: ... - @pytest.mark.asyncio - async def test_cache_key_is_deterministic( - self, openlibrary_service: OpenLibraryService - ) -> None: + def test_cache_key_is_deterministic(self) -> None: """Test that same search params produce same cache key.""" key1 = CacheService.search_key(title="Python", author="Guido") key2 = CacheService.search_key(title="Python", author="Guido") key3 = CacheService.search_key(title="python", author="guido") assert key1 == key2 assert key1 == key3 # Normalized - - await openlibrary_service.close()backend/src/bookbytes/services/cache.py (2)
134-154: Consider batching deletes ininvalidate_patternfor better performance.Deleting keys one-by-one in a loop can be slow for large key sets. Consider batching with a pipeline:
async def invalidate_pattern(self, pattern: str) -> int: try: count = 0 - async for key in self.redis.scan_iter(match=pattern): - await self.redis.delete(key) - count += 1 + batch = [] + async for key in self.redis.scan_iter(match=pattern): + batch.append(key) + if len(batch) >= 100: + await self.redis.delete(*batch) + count += len(batch) + batch = [] + if batch: + await self.redis.delete(*batch) + count += len(batch) logger.debug("cache_pattern_invalidated", pattern=pattern, count=count) return countThis is optional if you don't expect large invalidation sets.
231-242: Consider normalizing ISBN input for consistent cache hits.ISBNs can be formatted with or without hyphens (e.g.,
978-0-13-468599-1vs9780134685991). Without normalization, the same book could have multiple cache entries:@staticmethod def isbn_key(isbn: str) -> str: - return f"isbn:{isbn}" + # Remove hyphens and spaces for consistent keys + normalized = isbn.replace("-", "").replace(" ", "").strip() + return f"isbn:{normalized}"backend/src/bookbytes/services/openlibrary.py (1)
430-440: Work details parsing returns empty data for several fields.The
_parse_work_responsemethod returns empty/null values forauthors,isbn_list,edition_count, andfirst_publish_yearwith comments indicating these need separate API calls. This is fine as a simplification, but consider adding a TODO or tracking issue for completing this functionality.authors=[], # Need separate author fetch - simplified for nowDo you want me to open an issue to track implementing the author/edition data fetching?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.agent/rules/pythonic-fastapi-async.md(1 hunks)backend/alembic/versions/80de24a1ee9e_add_audio_books_library_models.py(1 hunks)backend/pyproject.toml(1 hunks)backend/src/bookbytes/config.py(1 hunks)backend/src/bookbytes/repositories/__init__.py(1 hunks)backend/src/bookbytes/repositories/audio_book.py(1 hunks)backend/src/bookbytes/repositories/book_provider.py(1 hunks)backend/src/bookbytes/repositories/chapter.py(1 hunks)backend/src/bookbytes/repositories/edition.py(1 hunks)backend/src/bookbytes/repositories/work.py(1 hunks)backend/src/bookbytes/services/__init__.py(1 hunks)backend/src/bookbytes/services/cache.py(1 hunks)backend/src/bookbytes/services/openlibrary.py(1 hunks)backend/tests/integration/test_openlibrary_api.py(1 hunks)backend/tests/unit/__init__.py(1 hunks)backend/tests/unit/test_cache_service.py(1 hunks)backend/tests/unit/test_openlibrary_service.py(1 hunks)tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md(1 hunks)tasks/knowledge/multi-provider-integration-patterns.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- backend/tests/unit/init.py
🧰 Additional context used
🧬 Code graph analysis (9)
backend/src/bookbytes/repositories/book_provider.py (2)
backend/src/bookbytes/models/book_provider.py (1)
BookProvider(36-100)backend/src/bookbytes/repositories/base.py (1)
BaseRepository(43-206)
backend/src/bookbytes/services/__init__.py (2)
backend/src/bookbytes/services/cache.py (3)
CacheService(33-253)get_cache_service(277-289)set_redis_client(260-274)backend/src/bookbytes/services/openlibrary.py (8)
BookSearchResult(28-76)OpenLibraryError(151-154)OpenLibraryRateLimitError(157-160)OpenLibraryService(163-458)SearchResponse(132-143)WorkDetails(80-128)get_openlibrary_service(474-480)set_openlibrary_service(468-471)
backend/tests/unit/test_cache_service.py (2)
backend/src/bookbytes/services/cache.py (9)
CacheService(33-253)get(69-96)search_key(191-229)isbn_key(232-241)work_key(244-253)_jitter_ttl(156-166)_get_original_ttl(168-184)invalidate(122-132)invalidate_pattern(134-154)backend/src/bookbytes/repositories/base.py (1)
delete(361-370)
backend/src/bookbytes/repositories/chapter.py (2)
backend/src/bookbytes/repositories/base.py (2)
BaseRepository(43-206)delete(361-370)backend/src/bookbytes/schemas/common.py (2)
offset(112-114)limit(117-119)
backend/tests/unit/test_openlibrary_service.py (4)
backend/src/bookbytes/services/openlibrary.py (13)
BookSearchResult(28-76)OpenLibraryError(151-154)OpenLibraryRateLimitError(157-160)OpenLibraryService(163-458)SearchResponse(132-143)WorkDetails(80-128)has_more(141-143)search_books(209-259)get_work_details(261-288)get_all_isbns_for_work(290-300)_user_agent(189-191)_parse_search_response(384-412)_parse_work_response(414-440)backend/tests/integration/test_openlibrary_api.py (2)
mock_cache(30-40)openlibrary_service(44-46)backend/src/bookbytes/services/cache.py (1)
get(69-96)backend/src/bookbytes/schemas/common.py (2)
offset(112-114)limit(117-119)
backend/src/bookbytes/repositories/edition.py (4)
backend/src/bookbytes/models/edition.py (1)
Edition(23-80)backend/src/bookbytes/repositories/base.py (1)
BaseRepository(43-206)backend/src/bookbytes/schemas/common.py (2)
offset(112-114)limit(117-119)backend/src/bookbytes/models/base.py (1)
id(47-52)
backend/src/bookbytes/repositories/__init__.py (6)
backend/src/bookbytes/repositories/audio_book.py (1)
AudioBookRepository(15-151)backend/src/bookbytes/repositories/base.py (2)
BaseRepository(43-206)SoftDeleteRepository(209-381)backend/src/bookbytes/repositories/book_provider.py (1)
BookProviderRepository(15-149)backend/src/bookbytes/repositories/chapter.py (1)
ChapterRepository(14-115)backend/src/bookbytes/repositories/edition.py (1)
EditionRepository(14-90)backend/src/bookbytes/repositories/work.py (1)
WorkRepository(13-75)
backend/src/bookbytes/repositories/work.py (2)
backend/src/bookbytes/models/work.py (1)
Work(24-61)backend/src/bookbytes/repositories/base.py (1)
BaseRepository(43-206)
backend/src/bookbytes/services/openlibrary.py (1)
backend/src/bookbytes/config.py (1)
get_settings(275-284)
🪛 LanguageTool
.agent/rules/pythonic-fastapi-async.md
[grammar] ~16-~16: Ensure spelling is correct
Context: ...de duplication. Favour composition over inheritence. Smart core, Thin interfaces. Use descr...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~37-~37: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
tasks/knowledge/multi-provider-integration-patterns.md
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
68-68: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
96-96: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
97-97: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
🪛 Ruff (0.14.7)
backend/alembic/versions/80de24a1ee9e_add_audio_books_library_models.py
8-8: Import from collections.abc instead: Sequence
Import from collections.abc
(UP035)
16-16: Use X | Y for type annotations
Convert to X | Y
(UP007)
17-17: Use X | Y for type annotations
Convert to X | Y
(UP007)
18-18: Use X | Y for type annotations
Convert to X | Y
(UP007)
🔇 Additional comments (55)
backend/pyproject.toml (5)
9-9: Verify readme file path.The readme is referenced as "Readme.md" (line 9). Ensure this file exists at the repository root and uses the correct filename (case-sensitive on case-sensitive filesystems).
95-96: LGTM — packaging structure is clear.The src-layout approach (packages = ["src/bookbytes"]) is well-established and prevents import issues during development.
103-120: LGTM — comprehensive and pragmatic lint rules.The selected lint rules (E, W, F, I, C4, B, UP, ARG, SIM) align well with modern Python practices. The ignore list is sensible, particularly B008 for FastAPI Depends() patterns and ARG001 for unused function arguments (common in dependency injection).
125-139: LGTM — pytest configuration is well-suited for async testing.The asyncio_mode = "auto" setting, test path definitions, and custom markers (integration, external, slow) provide good test organization and flexibility for the async-heavy codebase.
141-155: LGTM — strict mypy config with appropriate overrides.The strict mode enforcement (disallow_untyped_defs, warn_return_any, etc.) establishes a strong type-safety baseline. The overrides for untyped third-party packages (gtts, arq, aioboto3) are pragmatic and necessary.
Once the Python version mismatch is fixed, verify that mypy strict mode passes on the codebase to ensure all modules are properly typed.
backend/tests/unit/test_cache_service.py (5)
42-81: LGTM!The cache key generation tests comprehensively cover normalization, determinism, and different key types. The assertions correctly validate the expected key format and hash length.
89-127: LGTM!The TTL handling tests appropriately validate jitter bounds and key-type-based TTL resolution. Using slightly wider bounds (0.89–1.11) for the jitter test is a sensible choice to avoid flaky tests from floating-point edge cases.
135-190: LGTM!The cache get tests thoroughly cover the key scenarios: cache misses, hits with fresh data, stale-while-revalidate triggering, and graceful error handling. The mock setup correctly simulates the Redis responses.
198-235: LGTM!The cache set tests validate JSON serialization, TTL jitter application, and error resilience. The assertions on
call_argscorrectly verify the data being passed to Redis.
243-282: LGTM!The cache invalidation tests properly cover single-key deletion, pattern-based invalidation with an async generator mock, and error handling. The pattern test correctly simulates the scan-and-delete workflow.
backend/tests/unit/test_openlibrary_service.py (6)
24-80: LGTM!The mock response data accurately reflects the OpenLibrary API structure, and the fixtures provide clean, reusable test setup. The default cache miss behavior is appropriate for testing API call paths.
88-165: LGTM!The DTO serialization tests thoroughly verify field preservation through
to_dictandfrom_dictoperations, ensuring cached data can be reliably reconstructed.
198-286: LGTM!The search tests comprehensively cover cache hit/miss scenarios, API interaction, and parameter passing. The use of
patch.objectfor_get_clienteffectively isolates HTTP behavior.
293-342: LGTM!The work details tests properly validate the cache-first pattern and API fallback behavior, with appropriate assertions on caching side effects.
350-414: LGTM!The error handling tests thoroughly verify the exception mapping: 429 →
OpenLibraryRateLimitError, 5xx →OpenLibraryError, and network failures →OpenLibraryError. This ensures callers can handle different failure modes appropriately.
422-496: LGTM!The helper method and response parsing tests validate delegation patterns, User-Agent formatting, and edge cases in API response handling (cover URL construction, dict-format descriptions).
backend/src/bookbytes/repositories/edition.py (5)
1-15: LGTM!The repository correctly extends
BaseRepository[Edition]. Since Edition appears to use hard deletes (no soft-delete mixin mentioned),BaseRepositoryis the appropriate base class.
17-27: LGTM!The
get_by_isbnmethod is correctly implemented for querying by the unique, indexed ISBN field.
29-53: LGTM!The
get_by_work_idmethod correctly implements paginated retrieval with sensible ordering (newest editions first, nulls last).
55-76: LGTM!The
get_latest_by_workmethod correctly retrieves the most recent edition for a work in the specified language. Thenulls_last()ordering ensures editions with known publish years are preferred.
78-90: LGTM!The
isbn_existsmethod efficiently checks for ISBN existence by selecting only the ID column with a limit of 1, avoiding unnecessary data transfer.backend/src/bookbytes/repositories/chapter.py (5)
17-41: LGTM!The
get_by_audio_bookmethod correctly retrieves chapters ordered by chapter number with pagination support.
43-62: LGTM!The
get_by_numbermethod correctly retrieves a specific chapter using the unique constraint on(audio_book_id, chapter_number).
64-80: LGTM!The count query logic is correct. The inline import issue was addressed in the earlier comment.
82-98: LGTM!The
get_total_durationmethod correctly aggregates chapter durations withcoalesceto handle the empty-set case gracefully.
100-115: LGTM!The bulk delete implementation is appropriate for the regeneration use case. Using
result.rowcountto return the deletion count is correct.backend/src/bookbytes/repositories/audio_book.py (6)
1-16: LGTM!The repository appropriately extends
SoftDeleteRepositoryto preserve audiobook history, with clear documentation of the design rationale.
18-39: LGTM!The
get_by_editionmethod correctly implements soft-delete-aware querying with an optional flag to include deleted records.
41-66: LGTM!The
get_by_statusmethod correctly filters by status with pagination and soft-delete awareness. Usingstatus.valueensures proper comparison with the stored string value.
68-94: LGTM!The
get_pendingandget_processingconvenience methods cleanly delegate toget_by_statuswith appropriate defaults for processing workflows.
96-137: LGTM!The status mutation methods (
mark_processing,mark_completed,mark_failed) correctly update the audiobook state with appropriate side effects (clearing errors on success, recording errors on failure).
139-151: LGTM!The
increment_versionmethod correctly implements the regeneration workflow: incrementing the version, resetting status to pending, and clearing any previous errors.backend/src/bookbytes/repositories/book_provider.py (5)
1-16: LGTM!Clean imports and class definition following the established repository pattern.
18-37: LGTM!Query leverages the unique constraint on
(provider, external_key)correctly.
39-69: LGTM!The double filter on
entity_typeandwork_id/edition_idis defensive and clarifies intent. The existingix_book_providers_entity_lookupindex on(entity_type, entity_id)should cover lookups viaentity_id, though these queries filter bywork_id/edition_iddirectly which is fine given the FK columns.
100-127: LGTM!Consistent with
create_work_mappingpattern. SameIntegrityErrorhandling consideration applies.
129-149: LGTM!Efficient existence check selecting only the
idcolumn withlimit(1).backend/tests/integration/test_openlibrary_api.py (4)
1-23: LGTM!Clear documentation on running and skipping integration tests.
162-204: LGTM!Test logic for work details retrieval is sound. Same fixture cleanup note applies.
211-228: LGTM!Simple and effective test for ISBN presence in search results.
235-250: LGTM!Good test for error handling with invalid work keys.
backend/src/bookbytes/services/cache.py (6)
1-31: LGTM!Excellent documentation explaining the caching strategy, key types, and TTLs.
33-67: LGTM!Clear TTL constants and sensible revalidation threshold.
69-96: LGTM!Correct stale-while-revalidate implementation with fail-open error handling.
98-120: LGTM!Auto-detected TTL with jitter is a good pattern for preventing cache stampede.
156-184: LGTM!Good jitter implementation with sensible defaults for unknown key prefixes.
256-289: LGTM!Standard FastAPI dependency injection pattern with clear initialization requirements.
backend/src/bookbytes/services/openlibrary.py (8)
1-19: LGTM!Clean imports and clear module documentation.
27-77: LGTM!Well-designed DTO with proper serialization methods and sensible defaults.
79-129: LGTM!Consistent DTO design with
BookSearchResult.
131-161: LGTM!Clean pagination container and appropriate exception hierarchy.
163-207: LGTM!Proper lazy HTTP client management with cleanup.
261-288: LGTM!Correct cache-first pattern for work details.
306-352: LGTM!Comprehensive error handling with specific rate limit detection and appropriate logging.
465-480: LGTM!Consistent FastAPI dependency injection pattern.
| from uuid import UUID | ||
|
|
||
| from sqlalchemy import select | ||
|
|
||
| from bookbytes.models.chapter import Chapter | ||
| from bookbytes.repositories.base import BaseRepository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Move inline imports to the top of the file.
The func and delete imports from SQLAlchemy are used in methods below but imported inline. This deviates from Python conventions and makes dependencies less visible.
from uuid import UUID
-from sqlalchemy import select
+from sqlalchemy import delete, func, select
from bookbytes.models.chapter import Chapter
from bookbytes.repositories.base import BaseRepositoryThen remove the inline imports at lines 73, 91, and 109.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from uuid import UUID | |
| from sqlalchemy import select | |
| from bookbytes.models.chapter import Chapter | |
| from bookbytes.repositories.base import BaseRepository | |
| from uuid import UUID | |
| from sqlalchemy import delete, func, select | |
| from bookbytes.models.chapter import Chapter | |
| from bookbytes.repositories.base import BaseRepository |
🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/chapter.py around lines 6 to 11, the file
currently only imports select from SQLAlchemy while func and delete are imported
inline later; move those inline imports to the top of the file by adding them to
the existing SQLAlchemy import statement (i.e., import func and delete alongside
select) and then remove the inline import statements at lines 73, 91, and 109 so
all SQLAlchemy dependencies are declared at the module level.
| async def get_with_editions(self, work_id: Work) -> Work | None: | ||
| """Get a work with its editions eagerly loaded. | ||
|
|
||
| Args: | ||
| work_id: Work UUID | ||
|
|
||
| Returns: | ||
| Work with editions loaded, or None | ||
| """ | ||
| result = await self.session.execute( | ||
| select(Work).options(selectinload(Work.editions)).where(Work.id == work_id) | ||
| ) | ||
| return result.scalar_one_or_none() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix work_id type annotation in get_with_editions.
The parameter is annotated as work_id: Work but treated as an ID (Work.id == work_id) and documented as “Work UUID”. This can mislead callers into passing a Work instance and cause runtime errors at the DB driver.
Change the signature to something like:
- async def get_with_editions(self, work_id: Work) -> Work | None:
+ async def get_with_editions(self, work_id: UUID) -> Work | None:(and import UUID if not already present).
🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/work.py around lines 38 to 50, the
parameter is incorrectly typed as work_id: Work while the code and docstring
treat it as a Work UUID; change the signature to accept a UUID (e.g., work_id:
UUID), update the docstring to reflect a UUID parameter if needed, and
add/import UUID from the uuid module at the top of the file (or use
typing.Union[UUID, str] if string IDs are accepted) so callers and the DB
comparison use the correct type.
| # Check cache first | ||
| cache_key = CacheService.search_key( | ||
| title=title, | ||
| author=author, | ||
| publisher=publisher, | ||
| language=language, | ||
| ) | ||
|
|
||
| cached_data, needs_revalidation = await self.cache.get(cache_key) | ||
| if cached_data and not needs_revalidation: | ||
| logger.debug("search_cache_hit", cache_key=cache_key) | ||
| return self._parse_cached_search(cached_data) | ||
|
|
||
| # Cache miss or stale - fetch from API | ||
| logger.debug("search_cache_miss", cache_key=cache_key) | ||
| response = await self._fetch_search( | ||
| title=title, | ||
| author=author, | ||
| publisher=publisher, | ||
| language=language, | ||
| offset=offset, | ||
| ) | ||
|
|
||
| # Cache the response (fire and forget for now) | ||
| await self.cache.set(cache_key, self._serialize_search(response)) | ||
|
|
||
| return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache key doesn't include offset parameter, breaking pagination.
The cache key is generated without considering offset, so requesting page 2 (offset=20) will return the cached page 1 results. Either:
- Include offset in the cache key, or
- Only cache the first page (offset=0)
cache_key = CacheService.search_key(
title=title,
author=author,
publisher=publisher,
language=language,
)
- cached_data, needs_revalidation = await self.cache.get(cache_key)
- if cached_data and not needs_revalidation:
+ # Only use cache for first page
+ if offset == 0:
+ cached_data, needs_revalidation = await self.cache.get(cache_key)
+ if cached_data and not needs_revalidation:
+ logger.debug("search_cache_hit", cache_key=cache_key)
+ return self._parse_cached_search(cached_data)
+ else:
+ cached_data = NoneOr modify CacheService.search_key() to accept an optional offset parameter.
🤖 Prompt for AI Agents
In backend/src/bookbytes/services/openlibrary.py around lines 233-259, the cache
key is generated without the offset causing paginated requests to collide; fix
by including the offset when building the key (call CacheService.search_key(...,
offset=offset)) and, if CacheService.search_key does not currently accept
offset, update its signature/implementation to include offset in the key
generation (or alternatively only cache when offset == 0); ensure cache.get/set
use the same key and the rest of the serialization logic remains unchanged.
| @pytest.fixture | ||
| async def skip_if_no_network(): | ||
| """Skip test if OpenLibrary API is unreachable.""" | ||
| is_reachable = await check_openlibrary_reachable() | ||
| if not is_reachable: | ||
| pytest.skip("OpenLibrary API is unreachable (network issue)") | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip_if_no_network fixture is defined but never used.
The tests don't reference this fixture, so they'll fail rather than skip when OpenLibrary is unreachable. Either:
- Add
skip_if_no_networkas a parameter to each test method, or - Make the fixture
autouse=Truewith appropriate scope
@pytest.fixture
-async def skip_if_no_network():
+async def skip_if_no_network(request):
"""Skip test if OpenLibrary API is unreachable."""
is_reachable = await check_openlibrary_reachable()
if not is_reachable:
pytest.skip("OpenLibrary API is unreachable (network issue)")
return TrueThen add it to test methods:
async def test_search_returns_results(
self, skip_if_no_network, openlibrary_service: OpenLibraryService
) -> None:🤖 Prompt for AI Agents
In backend/tests/integration/test_openlibrary_api.py around lines 61 to 67, the
skip_if_no_network fixture is defined but never used so tests won't skip when
OpenLibrary is unreachable; fix by either (A) adding skip_if_no_network as a
parameter to every async test that relies on the network (e.g.
test_search_returns_results(self, skip_if_no_network, openlibrary_service:
OpenLibraryService) -> None) or (B) change the fixture to autouse=True (and
appropriate scope) so it runs automatically for the module, e.g. mark the
fixture as @pytest.fixture(autouse=True, scope="module") to ensure tests are
skipped when the API is unreachable.
| @pytest.mark.integration | ||
| @pytest.mark.external | ||
| class TestOpenLibrarySearchIntegration: | ||
| """Integration tests for search_books against real API.""" | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_search_returns_results( | ||
| self, openlibrary_service: OpenLibraryService | ||
| ) -> None: | ||
| """Test that search returns real results.""" | ||
| result = await openlibrary_service.search_books(title="Lord of the Rings") | ||
|
|
||
| assert result.total_found > 0 | ||
| assert len(result.results) > 0 | ||
|
|
||
| # Verify first result has expected fields | ||
| first = result.results[0] | ||
| assert first.title # Has a title | ||
| assert first.external_work_key # Has work key | ||
| assert first.source_provider == "openlibrary" | ||
|
|
||
| await openlibrary_service.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Resource cleanup should use fixture teardown, not inline close() calls.
Calling close() at the end of each test is fragile—if the test fails before reaching close(), the HTTP client leaks. Use fixture-based cleanup instead:
@pytest.fixture
-def openlibrary_service(mock_cache) -> OpenLibraryService:
+async def openlibrary_service(mock_cache) -> OpenLibraryService:
"""Create OpenLibraryService with mock cache for real API testing."""
- return OpenLibraryService(mock_cache)
+ service = OpenLibraryService(mock_cache)
+ yield service
+ await service.close()Then remove all await openlibrary_service.close() calls from test methods.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/tests/integration/test_openlibrary_api.py around lines 75 to 96, the
test is calling await openlibrary_service.close() inline which can leak the HTTP
client if the test fails; remove the inline close() calls from tests and instead
implement fixture-based teardown: change the openlibrary_service fixture to use
a yield pattern (create client, yield it, then close it in the fixture’s
finally/teardown block) so the client is always closed regardless of test
outcome, and update other integration tests to no longer call close().
| ``` | ||
| OpenLibrary API ──▶ ┌──────────────────┐ ◀── Google Books | ||
| │ Canonical Model │ | ||
| │ (BookSearchResult│ | ||
| │ Edition, Work) │ | ||
| └────────┬─────────┘ | ||
| ▼ | ||
| ┌──────────────────┐ | ||
| │ Redis Cache │ | ||
| │ (Canonical JSON) │ | ||
| └──────────────────┘ | ||
| ``` | ||
|
|
||
| ### Anti-Corruption Layer (ACL) | ||
|
|
||
| From Domain-Driven Design (DDD) - translation layer protecting domain from external system quirks. | ||
|
|
||
| ```python | ||
| class OpenLibraryAdapter: | ||
| """Transforms OpenLibrary responses to canonical models.""" | ||
|
|
||
| def to_search_result(self, raw: dict) -> BookSearchResult: | ||
| return BookSearchResult( | ||
| title=raw["title"], | ||
| authors=self._extract_authors(raw), | ||
| source_provider="openlibrary", | ||
| ... | ||
| ) | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Cache Key Strategy | ||
|
|
||
| ### Current (Phase 1 - Single Provider) | ||
|
|
||
| Provider-agnostic keys: | ||
| | Key | Description | | ||
| |-----|-------------| | ||
| | `search:{hash}` | Search results | | ||
| | `isbn:{isbn}` | Book by ISBN | | ||
| | `work:{identifier}` | Work details | | ||
|
|
||
| ### Future (Multi-Provider Phase) | ||
|
|
||
| Same keys, but cached data includes metadata: | ||
|
|
||
| ```python | ||
| { | ||
| "data": { ... }, # Canonical model | ||
| "metadata": { | ||
| "source_provider": "openlibrary", | ||
| "fetched_at": "2024-12-07T...", | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Trade-offs Analysis | ||
|
|
||
| | Trade-off | Decision | Rationale | | ||
| | ------------------- | ----------------------------------------- | ----------------------------------------- | | ||
| | First provider wins | ✅ Accepted | Book metadata is similar across providers | | ||
| | Transform overhead | ✅ Accepted | Happens once at fetch time | | ||
| | Provider debugging | Include `source_provider` in cached value | | ||
| | Schema evolution | Version canonical models carefully | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten Markdown formatting (code block + trade-offs table).
A few small tweaks will keep markdownlint and renderers happy:
- Add a language to the ASCII diagram code fence:
-```
+```text
OpenLibrary API ──▶ ┌──────────────────┐ ◀── Google Books
...
2. Ensure the trade-offs table has consistent columns (3 columns for all rows) and blank lines around it. For example, add rationales for the last two rows:
```diff
-| Provider debugging | Include `source_provider` in cached value |
-| Schema evolution | Version canonical models carefully |
+| Provider debugging | Include `source_provider` in cached value | Easier troubleshooting per cache entry |
+| Schema evolution | Version canonical models carefully | Avoid breaking consumers during changes |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
68-68: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
96-96: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
97-97: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
🤖 Prompt for AI Agents
In tasks/knowledge/multi-provider-integration-patterns.md around lines 31 to 97,
tighten markdown by adding a language identifier to the ASCII diagram code fence
(use ```text) and ensure the trade-offs table has exactly three columns for
every row and is separated from surrounding text by blank lines; update the last
two rows to include missing Rationale cells (e.g., "Included in cached metadata"
and "Manage via careful versioning") and add an empty line before and after the
table so markdownlint and renderers parse it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
backend/src/bookbytes/main.py (2)
49-51: Settings injection issue persists.The lifespan function still calls
get_settings()directly, ignoring any custom settings passed tocreate_app(settings). This breaks test scenarios where custom settings are injected.See the fix suggested in the previous review: store settings in
app.state._init_settingsbefore assigning lifespan, then read fromapp.statein lifespan.
147-153: CORS configuration issue persists.When
is_developmentisFalse,allow_origins=[]blocks all cross-origin requests whileallow_credentials=Trueremains set—this combination is ineffective.Configure explicit production origins via settings, as suggested in the previous review.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/src/bookbytes/api/v1/router.py(1 hunks)backend/src/bookbytes/api/v1/search.py(1 hunks)backend/src/bookbytes/main.py(1 hunks)backend/src/bookbytes/schemas/search.py(1 hunks)backend/tests/unit/test_search_endpoints.py(1 hunks)tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
🧰 Additional context used
🧬 Code graph analysis (1)
backend/tests/unit/test_search_endpoints.py (3)
backend/src/bookbytes/api/v1/search.py (3)
get_openlibrary_service(36-40)search_books(60-134)get_work_details(149-189)backend/src/bookbytes/services/cache.py (2)
get_cache_service(277-289)get(69-96)backend/src/bookbytes/services/openlibrary.py (4)
BookSearchResult(28-76)SearchResponse(132-143)WorkDetails(80-128)close(203-207)
🔇 Additional comments (13)
backend/src/bookbytes/api/v1/router.py (1)
1-17: LGTM!Clean router aggregation following standard FastAPI patterns. The structure allows for easy extension with future sub-routers as indicated by the placeholder comments.
backend/tests/unit/test_search_endpoints.py (4)
21-83: Well-structured test fixtures.Good use of dependency overrides for isolating tests from real services. The cleanup in
test_appensures no state leakage between tests.
91-236: Comprehensive search endpoint test coverage.Good coverage of success paths, pagination, filters, validation errors, and empty results. The assertions verify both response structure and service invocation parameters.
244-267: Test correctly validates work details retrieval.The test properly validates the response structure. Note that the URL
works/works/OL27448Wis due to the:pathparameter capturingworks/OL27448W, which the endpoint normalizes to/works/OL27448W.
275-327: Good ISBN lookup test coverage.Tests validate both success and not-found scenarios. Consider adding a test for external service errors (502 response) in future iterations.
backend/src/bookbytes/main.py (4)
156-210: Well-implemented request logging middleware.Good use of correlation IDs for distributed tracing, proper timing measurements, and cleanup in the
finallyblock. The middleware correctly propagatesX-Request-IDto responses.
213-274: Solid exception handling implementation.Appropriate log levels based on status codes, structured error responses with request IDs, and no sensitive data leakage. The
exception()call for unhandled errors ensures stack traces are captured.
277-341: Health endpoints follow Kubernetes best practices.Good separation of liveness (basic "am I running") and readiness (can I serve traffic) probes. The placeholder for Redis health check is appropriately marked with a TODO.
344-363: LGTM!Standard CLI setup with uvicorn. The module-level
appinstance is correctly created for the"bookbytes.main:app"import reference.backend/src/bookbytes/api/v1/search.py (3)
36-40: LGTM!Clean dependency factory pattern for injecting the cache service into OpenLibraryService.
137-189: LGTM!Clean implementation with proper path parameter handling using
:pathconverter, work key normalization, structured error handling, and resource cleanup in thefinallyblock.
224-226: The suggested fix is incorrect; the current implementation is the intended design.The
search_books()method doesn't have anisbnparameter—onlytitle,author,publisher,language, andoffset. Searching by ISBN via thetitleparameter is the correct approach given the OpenLibrary API's flexible search capabilities. The suggested change toisbn=clean_isbnwould fail at runtime.Likely an incorrect or invalid review comment.
backend/src/bookbytes/schemas/search.py (1)
228-290: LGTM!Well-defined schemas for job responses, refresh requests, and audiobook details with appropriate field constraints and documentation.
| try: | ||
| # Calculate offset for API call | ||
| # We fetch 100 results max, then slice for pagination | ||
| api_offset = ((page - 1) // 5) * 100 # Fetch next batch every 5 pages | ||
|
|
||
| result = await openlibrary.search_books( | ||
| title=request.title, | ||
| author=request.author, | ||
| publisher=request.publisher, | ||
| language=request.language, | ||
| offset=api_offset, | ||
| ) | ||
|
|
||
| # Calculate slice for requested page | ||
| # Within our 100-result batch, find the right slice | ||
| batch_start = (page - 1) % 5 * page_size | ||
| batch_end = batch_start + page_size | ||
| page_results = result.results[batch_start:batch_end] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pagination logic is broken for non-default page sizes.
The batch calculation assumes page_size=20 (5 pages × 20 = 100 per batch). When page_size differs, the slicing can be incorrect:
page=3, page_size=50:batch_start = (3-1) % 5 * 50 = 100, which exceeds the 100-result batch, returning empty results.page=2, page_size=100:batch_start = (2-1) % 5 * 100 = 100, same issue.
Consider calculating based on actual item offsets:
- # Calculate offset for API call
- # We fetch 100 results max, then slice for pagination
- api_offset = ((page - 1) // 5) * 100 # Fetch next batch every 5 pages
+ # Calculate offset for API call based on actual item positions
+ start_item = (page - 1) * page_size
+ api_offset = (start_item // 100) * 100 # Fetch batch containing start_item
result = await openlibrary.search_books(
title=request.title,
author=request.author,
publisher=request.publisher,
language=request.language,
offset=api_offset,
)
- # Calculate slice for requested page
- # Within our 100-result batch, find the right slice
- batch_start = (page - 1) % 5 * page_size
- batch_end = batch_start + page_size
+ # Calculate slice within the batch
+ batch_start = start_item - api_offset
+ batch_end = batch_start + page_size
page_results = result.results[batch_start:batch_end]🤖 Prompt for AI Agents
In backend/src/bookbytes/api/v1/search.py around lines 79 to 96, the pagination
assumes a fixed page_size of 20 which breaks slicing when page_size differs;
replace the current batch math with calculations based on item offsets: compute
global_offset = (page - 1) * page_size, set api_offset = (global_offset // 100)
* 100 to fetch the correct 100-item batch, then compute batch_start =
global_offset % 100 and batch_end = batch_start + page_size and slice
result.results[batch_start:batch_end] (also guard the slice against result
length to avoid index errors).
| docs_url="/docs" if settings.is_development else "/docs", | ||
| redoc_url="/redoc" if settings.is_development else "/redoc", | ||
| openapi_url="/openapi.json" if settings.is_development else "/openapi.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No-op conditionals for API documentation URLs.
These conditionals always evaluate to the same value regardless of environment:
docs_url="/docs" if settings.is_development else "/docs"If the intent was to disable docs in production, use None:
- docs_url="/docs" if settings.is_development else "/docs",
- redoc_url="/redoc" if settings.is_development else "/redoc",
- openapi_url="/openapi.json" if settings.is_development else "/openapi.json",
+ docs_url="/docs" if settings.is_development else None,
+ redoc_url="/redoc" if settings.is_development else None,
+ openapi_url="/openapi.json" if settings.is_development else None,🤖 Prompt for AI Agents
In backend/src/bookbytes/main.py around lines 115 to 117, the
docs_url/redoc_url/openapi_url conditionals are no-ops (they return the same
string regardless of environment); change each conditional to return the string
when settings.is_development is true and None when false (e.g., docs_url="/docs"
if settings.is_development else None) so documentation endpoints are disabled in
production; update all three entries (docs_url, redoc_url, openapi_url)
accordingly and run the app to verify docs are reachable in development and
hidden in production.
| model_config = ConfigDict(from_attributes=True) | ||
|
|
||
| title: str = Field(..., description="Book title") | ||
| authors: list[str] = Field(default_factory=list, description="Author names") | ||
| first_publish_year: int | None = Field(None, description="First publication year") | ||
| cover_url: str | None = Field(None, description="Cover image URL") | ||
| isbn_list: list[str] = Field( | ||
| default_factory=list, description="Available ISBNs (limited)" | ||
| ) | ||
| edition_count: int = Field(0, description="Number of editions") | ||
| subjects: list[str] = Field(default_factory=list, description="Subject categories") | ||
| external_work_key: str = Field(..., description="OpenLibrary work key") | ||
|
|
||
| model_config = ConfigDict( | ||
| json_schema_extra={ | ||
| "example": { | ||
| "title": "The Lord of the Rings", | ||
| "authors": ["J. R. R. Tolkien"], | ||
| "first_publish_year": 1954, | ||
| "cover_url": "https://covers.openlibrary.org/b/id/258027-M.jpg", | ||
| "isbn_list": ["9780618640157", "0618640150"], | ||
| "edition_count": 120, | ||
| "subjects": ["Fantasy", "Epic"], | ||
| "external_work_key": "/works/OL27448W", | ||
| } | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate model_config overwrites from_attributes=True.
The second model_config assignment at line 75 completely overrides the first one at line 62, causing from_attributes=True to be lost.
Combine both into a single ConfigDict:
- model_config = ConfigDict(from_attributes=True)
-
title: str = Field(..., description="Book title")
authors: list[str] = Field(default_factory=list, description="Author names")
first_publish_year: int | None = Field(None, description="First publication year")
cover_url: str | None = Field(None, description="Cover image URL")
isbn_list: list[str] = Field(
default_factory=list, description="Available ISBNs (limited)"
)
edition_count: int = Field(0, description="Number of editions")
subjects: list[str] = Field(default_factory=list, description="Subject categories")
external_work_key: str = Field(..., description="OpenLibrary work key")
model_config = ConfigDict(
+ from_attributes=True,
json_schema_extra={
"example": {
...
}
}
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model_config = ConfigDict(from_attributes=True) | |
| title: str = Field(..., description="Book title") | |
| authors: list[str] = Field(default_factory=list, description="Author names") | |
| first_publish_year: int | None = Field(None, description="First publication year") | |
| cover_url: str | None = Field(None, description="Cover image URL") | |
| isbn_list: list[str] = Field( | |
| default_factory=list, description="Available ISBNs (limited)" | |
| ) | |
| edition_count: int = Field(0, description="Number of editions") | |
| subjects: list[str] = Field(default_factory=list, description="Subject categories") | |
| external_work_key: str = Field(..., description="OpenLibrary work key") | |
| model_config = ConfigDict( | |
| json_schema_extra={ | |
| "example": { | |
| "title": "The Lord of the Rings", | |
| "authors": ["J. R. R. Tolkien"], | |
| "first_publish_year": 1954, | |
| "cover_url": "https://covers.openlibrary.org/b/id/258027-M.jpg", | |
| "isbn_list": ["9780618640157", "0618640150"], | |
| "edition_count": 120, | |
| "subjects": ["Fantasy", "Epic"], | |
| "external_work_key": "/works/OL27448W", | |
| } | |
| } | |
| ) | |
| title: str = Field(..., description="Book title") | |
| authors: list[str] = Field(default_factory=list, description="Author names") | |
| first_publish_year: int | None = Field(None, description="First publication year") | |
| cover_url: str | None = Field(None, description="Cover image URL") | |
| isbn_list: list[str] = Field( | |
| default_factory=list, description="Available ISBNs (limited)" | |
| ) | |
| edition_count: int = Field(0, description="Number of editions") | |
| subjects: list[str] = Field(default_factory=list, description="Subject categories") | |
| external_work_key: str = Field(..., description="OpenLibrary work key") | |
| model_config = ConfigDict( | |
| from_attributes=True, | |
| json_schema_extra={ | |
| "example": { | |
| "title": "The Lord of the Rings", | |
| "authors": ["J. R. R. Tolkien"], | |
| "first_publish_year": 1954, | |
| "cover_url": "https://covers.openlibrary.org/b/id/258027-M.jpg", | |
| "isbn_list": ["9780618640157", "0618640150"], | |
| "edition_count": 120, | |
| "subjects": ["Fantasy", "Epic"], | |
| "external_work_key": "/works/OL27448W", | |
| } | |
| } | |
| ) |
🤖 Prompt for AI Agents
In backend/src/bookbytes/schemas/search.py around lines 62 to 88, there are two
model_config assignments which cause the second to overwrite the first (losing
from_attributes=True); merge them into a single ConfigDict that includes
from_attributes=True and the existing json_schema_extra example (remove the
duplicate assignment and keep one combined model_config).
| class ProcessBookRequest(BaseSchema): | ||
| """Request to process a book into an audiobook. | ||
|
|
||
| Either edition_id OR isbn must be provided (not both). | ||
| """ | ||
|
|
||
| edition_id: UUID | None = Field( | ||
| None, description="Internal edition UUID (if already in library)" | ||
| ) | ||
| isbn: str | None = Field( | ||
| None, | ||
| min_length=10, | ||
| max_length=17, | ||
| description="ISBN to process (10 or 13 digits)", | ||
| ) | ||
|
|
||
| model_config = ConfigDict( | ||
| json_schema_extra={ | ||
| "example": {"isbn": "9780618640157"}, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing validation for mutually exclusive fields.
The docstring states "Either edition_id OR isbn must be provided (not both)", but the schema allows both to be None or both to be set. Add a model_validator to enforce this:
+from pydantic import model_validator
+
class ProcessBookRequest(BaseSchema):
"""Request to process a book into an audiobook.
Either edition_id OR isbn must be provided (not both).
"""
edition_id: UUID | None = Field(
None, description="Internal edition UUID (if already in library)"
)
isbn: str | None = Field(
None,
min_length=10,
max_length=17,
description="ISBN to process (10 or 13 digits)",
)
+ @model_validator(mode="after")
+ def check_edition_or_isbn(self) -> "ProcessBookRequest":
+ if self.edition_id is None and self.isbn is None:
+ raise ValueError("Either edition_id or isbn must be provided")
+ if self.edition_id is not None and self.isbn is not None:
+ raise ValueError("Provide either edition_id or isbn, not both")
+ return selfCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/src/bookbytes/schemas/search.py around lines 205 to 225, the
ProcessBookRequest schema's docstring requires exactly one of edition_id or isbn
but there's no runtime enforcement; add a Pydantic model_validator inside the
class (use @model_validator(mode="after")) that checks the instance: if both
fields are None or both are non-None, raise a ValueError with a clear message
like "Either edition_id or isbn must be provided, but not both." Place the
validator method in the class body and keep the existing Field definitions and
model_config unchanged.
Add LibraryService class to manage book persistence with Work and Edition operations including: - Finding works by provider - Creating works from search results - Storing editions with ISBN normalization - Creating provider mappings Include unit and integration tests for all service methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (6)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (3)
1-10: Add a scope/status section clarifying which tasks are in-scope for this PR.The document marks many tasks as incomplete ([ ]): the entire Testing phase (3.G), Background Jobs (3.F), and partial API endpoints (3.E.5-3.E.6). For a PR titled "productionise," the scope should be explicit. Is this PR completing only the [x] tasks, or are incomplete sections deferred to follow-up PRs?
Add a "PR Scope" section near the overview:
## PR Scope This PR focuses on: - [x] Phase 3.0: UUIDv7 foundation - [x] Phase 3.A: Data Models - [x] Phase 3.A-R: Repositories - [x] Phase 3.B: Cache Service - [x] Phase 3.C: OpenLibrary Service - [x] Phase 3.D: Library Service - [x] Phase 3.E: API endpoints (partial - 3.E.1 to 3.E.4, 3.E.7-3.E.8) - [ ] Phase 3.F: Background Jobs (deferred) - [ ] Phase 3.G: Testing (deferred) See follow-up PR #X for testing and background job tasks.
194-196: Specify the cache key generation format and validation strategy.Task 3.B.6 mentions normalizing params and using SHA256 hash, but the exact format is undefined. Will keys be
"search:{sha256_hash}"or"{service}:{action}:{sha256_hash}"? This affects cache pattern invalidation (line 187:invalidate_pattern()) and debugging.Clarify the cache key format:
### Cache Key Format Keys follow the pattern: `{namespace}:{action}:{param_hash}` Example: `search:books:a3f2c1e8...` (SHA256 of normalized params) Normalization steps: 1. Lowercase all strings 2. Trim whitespace 3. Sort dict keys for consistent ordering 4. Serialize to JSON 5. SHA256(json_string)
125-128: Clarify migration rollback strategy and version compatibility.Tasks 3.A.8 and 3.A.9 reference Alembic migrations but don't specify:
- Version compatibility (can migrations rollback safely?)
- Data migration strategy for soft-delete enablement
- Testing of migration reversibility
Add migration guidance:
- [x] 3.A.8 Generate migration: `alembic revision --autogenerate -m "add_audio_books_library_models"` - Verify migration is reversible: `alembic downgrade -1 && alembic upgrade +1` - Test on staging database before production deployment - Document any data cleanup required on rollbackbackend/tests/unit/test_library_service.py (1)
295-322: Consider adding ISBN normalization test forisbn_exists.The tests cover both boolean outcomes, but unlike
find_by_isbn, there's no test verifying thatisbn_existsnormalizes the input ISBN. The service implementation does normalize, so a test likeisbn_exists("978-0-618-64015-7")would ensure consistency.backend/src/bookbytes/services/library.py (2)
142-186: Consider adding provider metadata for consistency.
get_or_create_workpassesprovider_metadatawithfetched_atandedition_counttocreate_work_mapping, butget_or_create_work_from_detailsdoes not. SinceWorkDetailsincludesedition_count, you could include it for consistency:await self.provider_repo.create_work_mapping( work_id=created_work.id, provider=self.PROVIDER_OPENLIBRARY, external_key=work_details.external_work_key, + provider_metadata={ + "edition_count": work_details.edition_count, + }, )
254-256: Consider validating ISBN length.The current logic defaults to
"isbn10"for any ISBN that isn't 13 characters, which could mask invalid ISBNs. Consider adding validation:# Determine ISBN type - isbn_type = "isbn13" if len(clean_isbn) == 13 else "isbn10" + if len(clean_isbn) == 13: + isbn_type = "isbn13" + elif len(clean_isbn) == 10: + isbn_type = "isbn10" + else: + raise ValueError(f"Invalid ISBN length: {len(clean_isbn)} (expected 10 or 13)")This would catch malformed ISBNs earlier rather than silently storing them.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/src/bookbytes/services/library.py(1 hunks)backend/tests/integration/test_library_service.py(1 hunks)backend/tests/integration/test_search_endpoints.py(1 hunks)backend/tests/unit/test_library_service.py(1 hunks)tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/tests/unit/test_library_service.py (4)
backend/src/bookbytes/models/edition.py (1)
Edition(23-80)backend/src/bookbytes/models/work.py (1)
Work(24-61)backend/src/bookbytes/services/library.py (5)
isbn_exists(287-297)find_work_by_provider(57-82)get_or_create_work(84-140)find_by_isbn(191-201)store_edition(219-285)backend/src/bookbytes/services/openlibrary.py (1)
BookSearchResult(28-76)
backend/tests/integration/test_search_endpoints.py (3)
backend/src/bookbytes/services/cache.py (2)
get_cache_service(277-289)get(69-96)backend/src/bookbytes/services/openlibrary.py (4)
BookSearchResult(28-76)SearchResponse(132-143)WorkDetails(80-128)close(203-207)backend/tests/conftest.py (2)
app(56-58)async_client(62-91)
🔇 Additional comments (23)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (1)
3-4: Provide the referenced design document in the PR or clarify its availability.Line 3 references
./design-doc.mdas a design reference, but this file is not included in the PR review. If it contains critical architectural decisions (data model relationships, caching strategy, API contract), reviewers need access to validate the implementation tasks.Confirm whether
design-doc.mdexists in the PR. If it does, provide it for review. If it's external or missing, either add it or remove the reference.backend/tests/unit/test_library_service.py (6)
1-14: LGTM!Clean imports and well-organized module structure with clear docstring explaining the test purpose.
21-101: LGTM!Well-structured fixtures with proper async mock setup. The sample data fixtures use consistent UUIDs that properly link the work and edition entities.
109-148: LGTM!Good coverage of both the "not found" and "found" paths for
find_work_by_provider. The mock setup correctly simulates the provider mapping lookup.
150-191: LGTM!Solid test coverage for the get-or-create pattern, correctly verifying that existing works are returned without duplication and new works trigger both entity creation and provider mapping.
198-226: LGTM!Good tests for ISBN normalization and successful lookup. The normalization test is particularly valuable for ensuring consistent ISBN handling.
228-293: LGTM!Comprehensive test coverage for
store_editionincluding idempotent behavior, new edition creation, and optional provider mapping creation.backend/tests/integration/test_library_service.py (5)
1-9: LGTM!Clear docstring that honestly acknowledges these are not true database integration tests. The run command is helpful for developers.
26-123: LGTM!Good fixture organization. Using
sample_workas a dependency forsample_editionimproves maintainability. The addition ofsample_work_detailsenables testing theget_or_create_work_from_detailscode path.
130-191: LGTM!Good detailed assertions that verify the provider mapping kwargs. The
test_create_work_from_detailstest provides coverage for theWorkDetailscode path not tested in the unit tests.
198-285: LGTM!The
test_store_edition_normalizes_isbntest adds valuable coverage by verifying ISBN normalization during edition storage. Good complementary testing alongside the unit tests.
292-332: LGTM!Focused tests for provider mapping scenarios with good assertion detail on the mapping kwargs.
backend/src/bookbytes/services/library.py (7)
1-52: LGTM!Clean module structure with clear docstrings. The constructor properly accepts the three repository dependencies, enabling testability through dependency injection.
57-83: LGTM!The null-safe check
if mapping and mapping.work_idproperly handles both missing mappings and edition-only mappings. Good use of debug logging.
84-141: LGTM!Well-implemented get-or-create pattern with appropriate logging. The provider metadata capture (fetched_at, edition_count) is useful for debugging and analytics.
191-202: LGTM!Clean ISBN normalization before lookup.
203-218: LGTM!Simple delegation with sensible default language.
287-298: LGTM!Consistent ISBN normalization pattern with other methods.
300-319: LGTM with a note on testing.This global singleton pattern is common in FastAPI applications. The clear
RuntimeErrormessage helps catch initialization issues early. For testing, ensure you callset_library_service(None)or use a fixture to reset state between tests to avoid cross-test pollution.backend/tests/integration/test_search_endpoints.py (4)
25-72: LGTM! Fixtures are well-structured.The mock fixtures correctly use
AsyncMockfor async operations, and the sample data fixtures provide realistic test data that's reusable across multiple tests.
244-269: Good error response validation.The test correctly validates both the status code and the specific error code (
BOOK_NOT_FOUND), ensuring the API returns structured error responses.
305-332: LGTM! Request ID testing is comprehensive.The tests properly verify both automatic request ID generation and preservation of custom request IDs, which is essential for distributed tracing.
195-195: Verify the URL path structure.The URL
/api/v1/books/works/works/OL27448Wcontains "works" twice. Given thatexternal_work_keyis/works/OL27448W(line 71), verify whether the endpoint expects:
- The full key including "works/" prefix (current implementation), or
- Just the ID portion "OL27448W"
| app.dependency_overrides[get_cache_service] = lambda: mock_cache | ||
| app.dependency_overrides[get_openlibrary_service] = lambda: mock_openlibrary | ||
|
|
||
| response = await async_client.post( | ||
| "/api/v1/books/search", | ||
| json={"title": "Lord of the Rings"}, | ||
| ) | ||
|
|
||
| assert response.status_code == 200 | ||
| data = response.json() | ||
| assert data["total_found"] == 1 | ||
| assert len(data["results"]) == 1 | ||
| assert data["results"][0]["title"] == "The Lord of the Rings" | ||
|
|
||
| app.dependency_overrides.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Centralize dependency override cleanup in a fixture.
Manual cleanup with app.dependency_overrides.clear() appears in every test method (lines 114, 136, 169, 203, 242, 269, 297). If a test fails or raises an exception before reaching the cleanup line, overrides will persist and affect subsequent tests, causing flaky test failures.
Create an autouse fixture to ensure cleanup:
@pytest.fixture(autouse=True)
def clear_overrides(app):
"""Automatically clear dependency overrides after each test."""
yield
app.dependency_overrides.clear()Then remove all manual app.dependency_overrides.clear() calls from test methods.
🤖 Prompt for AI Agents
In backend/tests/integration/test_search_endpoints.py around lines 100 to 114,
tests manually call app.dependency_overrides.clear() at the end of each test
which can leak overrides if a test errors; add an autouse fixture in the test
module (or a conftest.py) that yields and clears app.dependency_overrides after
each test so cleanup always runs, then remove all manual
app.dependency_overrides.clear() calls from the individual tests.
| - [ ] 3.0.4 Verify UUIDv7 sorting works in PostgreSQL | ||
| - UUIDv7 is time-sortable when stored as native UUID type | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, check the task file to see the full context around lines 87-89
cat -n tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md | sed -n '80,150p'Repository: cryptus-neoxys/bookbytes
Length of output: 2904
🏁 Script executed:
# Verify that UUIDv7 values sort correctly by time in PostgreSQL
# Check if uuid6 library is in pyproject.toml
rg -n "uuid6" --type toml
# Verify UUIDPrimaryKeyMixin uses uuid7
rg -A 5 "class UUIDPrimaryKeyMixin" --type python
# Check model files for uuid7 usage
fd -e py -x rg -l "uuid7" {} \;Repository: cryptus-neoxys/bookbytes
Length of output: 141
🏁 Script executed:
# Find and check pyproject.toml
fd -name "pyproject.toml" -type fRepository: cryptus-neoxys/bookbytes
Length of output: 174
🏁 Script executed:
# Check for uuid6 in all TOML files
fd -name "*.toml" -type f -exec cat {} \; | grep -i uuid6 -B 2 -A 2Repository: cryptus-neoxys/bookbytes
Length of output: 174
🏁 Script executed:
# Search for UUIDPrimaryKeyMixin in the codebase
rg "UUIDPrimaryKeyMixin" -A 5 --type pythonRepository: cryptus-neoxys/bookbytes
Length of output: 174
🏁 Script executed:
# Search for uuid7 usage in Python files
rg "uuid7" --type python -B 2 -A 2Repository: cryptus-neoxys/bookbytes
Length of output: 174
Resolve task ordering: complete UUIDv7 sorting verification before marking dependent model tasks complete.
Task 3.0.4 (verify UUIDv7 sorting in PostgreSQL) is marked incomplete [ ], yet all dependent model creation tasks (3.A.1 through 3.A.9+) are marked complete [x]. This creates risk: if UUIDv7 sorting doesn't work as expected in PostgreSQL with the native UUID type, the entire model foundation may require rework.
Complete 3.0.4 verification and mark it complete before closing model creation tasks 3.A.1 onwards.
🤖 Prompt for AI Agents
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 87-89: Task 3.0.4 (Verify UUIDv7 sorting works in PostgreSQL) is
still unchecked while downstream model tasks 3.A.1+ are marked complete—ensure
3.0.4 is fully verified and passing before marking dependent tasks done. Run the
verification steps against the production PostgreSQL schema (store UUIDv7 as
native uuid, insert time-ordered samples, query ORDER BY the uuid column and
confirm chronological order), document the test results and any DB/config
changes, then update the checklist to mark 3.0.4 as complete only after
successful verification and revert any dependent model tasks to incomplete if
rework is required.
| - [x] 3.C.2 Add User-Agent header | ||
|
|
||
| - Include app name and contact for API compliance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify OpenLibrary User-Agent header configuration and runtime validation.
Line 208-209 require including app name and contact in the User-Agent header, but there's no task or configuration reference. How is this value injected? Where is it stored? Should it be validated in tests?
Add configuration details:
- [x] 3.C.2 Add User-Agent header
- Format: `bookbytes/0.1 (+https://github.com/cryptus-neoxys/bookbytes; [email protected])`
- Stored in `settings.OPENLIBRARY_USER_AGENT` (Pydantic Settings)
- Verified in unit tests with httpx mock| - [ ] 3.F.1 Create/update Job model for audiobook processing | ||
|
|
||
| - Add audio_book_id FK | ||
|
|
||
| - [ ] 3.F.2 Update `workers/tasks.py` with audiobook processing | ||
|
|
||
| - Fetch work details | ||
| - Collect all ISBNs | ||
| - Generate audio (existing pipeline) | ||
|
|
||
| - [ ] 3.F.3 Implement cache invalidation on job completion | ||
| - Invalidate search caches containing this work | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flag incomplete Background Jobs phase (3.F) as a blocker for production readiness.
All tasks in Phase 3.F (Background Jobs) are marked [ ] (incomplete). This is critical for a "productionise" PR because:
- Without task queue integration, audiobook processing is missing
- Cache invalidation on job completion (3.F.3) is unimplemented, risking stale data
- Job model updates (3.F.1) may conflict with existing Job schema
Either complete Phase 3.F in this PR or explicitly defer it to a follow-up PR with clear tracking.
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 307 to 319, Phase 3.F (Background Jobs) is left incomplete and must
be treated as a blocker: update the Job model to add audio_book_id FK in a
backward-compatible way (migration + nullable FK or separate new model to avoid
schema conflicts), implement audiobook processing in workers/tasks.py to fetch
work details, collect ISBNs and invoke the existing audio generation pipeline
(with error handling and retries), and add cache invalidation on job completion
that removes/refreshes any search caches containing the work; if you cannot
complete all items in this PR, explicitly defer them by creating tracker
tickets, add clear TODOs in the file referencing ticket IDs, and mark the PR as
blocked until those tickets are resolved.
| - All models use `UUIDPrimaryKeyMixin` (now v7) and `TimestampMixin` | ||
| - AudioBook uses `SoftDeleteMixin` for soft deletion | ||
| - AudioBookRepository extends `SoftDeleteRepository` | ||
| - BookProvider is a sparse/polymorphic table (see design doc for query patterns) | ||
| - **Redis-only caching:** No PostgreSQL cache table, uses AOF persistence (`appendfsync everysec`) | ||
| - **Cache vs Library:** Search cache is transient (Redis), library data is permanent (Work/Edition in PostgreSQL) | ||
| - Redis memory policy: `allkeys-lru` with 256mb limit | ||
| - OpenLibrary requires User-Agent header to avoid rate limiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing infrastructure and deployment configuration tasks.
The notes mention Redis configuration (allkeys-lru with 256mb limit, AOF persistence with appendfsync everysec) but there are no corresponding tasks for Docker, Helm, or deployment infrastructure. This is critical for operational readiness.
Add a Phase 3.H section for infrastructure/deployment:
### Phase 3.H: Infrastructure & Deployment
- [ ] 3.H.1 Create Redis Helm values with allkeys-lru policy and AOF persistence
- [ ] 3.H.2 Update docker-compose.yml for local development (Redis + PostgreSQL)
- [ ] 3.H.3 Add Redis health checks to Kubernetes deployment
- [ ] 3.H.4 Document Redis backup/recovery strategy for AOF🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 358 to 365, the notes list Redis configuration but lack concrete
infrastructure/deployment tasks; add a new "Phase 3.H: Infrastructure &
Deployment" section with checklist items to create Redis Helm values
(allkeys-lru, AOF appendfsync everysec), update docker-compose.yml for local dev
(Redis + PostgreSQL), add Redis health checks to the Kubernetes manifests, and
document Redis backup/recovery strategy for AOF so operational/deployment steps
align with the design notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
backend/Readme.md (1)
103-103: Consider relaxing the Python version requirement.The prerequisites mandate Python 3.13+, which is very recent (released October 2024) and may limit deployment environments and adoption. Most production systems still use Python 3.11 or 3.12 LTS. Document the minimum supported version more conservatively unless 3.13-specific features are critical.
Consider updating to "Python 3.11+" or "Python 3.12+" unless 3.13-specific features are essential. Document the rationale if a newer version is required.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/Readme.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
backend/Readme.md
13-13: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
36-36: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
75-75: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (1)
backend/Readme.md (1)
1-229: Overall: Well-structured and comprehensive documentation.The README is thorough, covering architecture, setup, API endpoints, testing, and configuration clearly. The ASCII diagrams and mermaid flowchart are helpful. Bash examples are practical. Once the licensing and markdown linting issues are resolved, this documentation will serve the backend well.
|
|
||
| ## Architecture | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add language identifiers to fenced code blocks.
Three fenced code blocks are missing language specifications, causing markdownlint violations at lines 13, 36, and 75. While these are ASCII diagrams, specifying a language improves consistency and tooling compatibility.
Apply this diff:
-```
+```text
┌──────────────────────────────────────────────────────────────┐
│ BookBytes Library │And similarly for lines 36 and 75:
-```
+```text
Work (canonical book)
├── Edition (specific ISBN/format)-```
+```text
backend/
├── src/bookbytes/Also applies to: 36-36, 75-75
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
13-13: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In backend/Readme.md around lines 13, 36 and 75, three fenced code blocks use
triple backticks with no language identifier; update each opening fence from ```
to ```text so the ASCII diagrams are marked as plain text (replace the fences at
those exact lines and keep the diagram content unchanged).
| | `LOG_LEVEL` | `INFO` | Logging level | | ||
| | `LOG_FORMAT` | `json` | `json` or `console` | | ||
|
|
||
| ## License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Establish a license before production release.
The README states "Not licensed yet", but this PR is for productionization and saasification. The project must have a clear license established before release to address intellectual property, liability, and usage rights.
Define and document a clear license (e.g., MIT, Apache 2.0, proprietary, AGPL, etc.) in the LICENSE file and update this section accordingly.
🤖 Prompt for AI Agents
In backend/Readme.md around line 229, the README currently says "Not licensed
yet" but the project is being productionized; add a LICENSE file with the chosen
license text (e.g., MIT, Apache-2.0, AGPL, or a proprietary license) and update
the README License section to reference and summarize that license. Commit the
LICENSE file at repo root containing the full license header, ensure package
metadata (package.json, setup.cfg, etc.) reflect the chosen license field, and
replace "Not licensed yet" with the license name and a short line like "See
LICENSE for full terms" plus any project-specific copyright.
…les for improved clarity and structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.agent/rules/pythonic-fastapi-async.md (1)
16-16: Fix spelling errors in guidance document (unfixed from prior review).These typos were flagged in a previous review but remain uncorrected:
- Line 16: "Shemas" → "Schemas" and "SQLAlachemy" → "SQLAlchemy"
- Line 22: "inheritence" → "inheritance"
Apply these corrections:
- Refer to FastAPI, Pydantic, SQLAlachemy, other library documentation for Data Models, Shemas, Path Operations, Middleware and for best practices. + Refer to FastAPI, Pydantic, SQLAlchemy, other library documentation for Data Models, Schemas, Path Operations, Middleware and for best practices.- Favour composition over inheritence. Smart core, Thin interfaces. + Favour composition over inheritance. Smart core, Thin interfaces.Also applies to: 22-22
tasks/backend-productionisation/tasks-backend-productionisation.md (1)
227-227: Fix markdown formatting in completed task annotations (duplicate of earlier feedback).Lines 227 and 234 still have incorrect emphasis marker syntax with spaces and stray escaped characters.
Apply this diff:
- - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_ + - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)*- - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)\_ + - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)*Also applies to: 234-234
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.agent/rules/pythonic-fastapi-async.md(1 hunks)tasks/backend-productionisation/tasks-backend-productionisation.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
tasks/backend-productionisation/tasks-backend-productionisation.md
[grammar] ~1-~1: Ensure spelling is correct
Context: # Tasks: Backend Productionisation PRD Reference: [prd-backend-production...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~106-~106: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ker/docker-compose.yml up` to start the full stack - Environment variables are validated a...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
.agent/rules/pythonic-fastapi-async.md
[grammar] ~16-~16: Ensure spelling is correct
Context: ... library documentation for Data Models, Shemas, Path Operations, Middleware and for be...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~22-~22: Ensure spelling is correct
Context: ... duplication. - Favour composition over inheritence. Smart core, Thin interfaces. ## Namin...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/tasks-backend-productionisation.md
227-227: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
234-234: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🔇 Additional comments (1)
tasks/backend-productionisation/tasks-backend-productionisation.md (1)
106-106: Use hyphenation for compound adjective modifying noun.Line 106 uses "full stack" as a compound adjective; it should be hyphenated as "full-stack" when modifying the following noun.
Apply this diff:
-Use `docker-compose -f docker/docker-compose.yml up` to start the full stack +Use `docker-compose -f docker/docker-compose.yml up` to start the full-stackAlternatively, restructure to avoid the compound adjective:
-Use `docker-compose -f docker/docker-compose.yml up` to start the full stack +Use `docker-compose -f docker/docker-compose.yml up` to run the entire stackLikely an incorrect or invalid review comment.
| - [ ] 6.0 **Phase 6: JWT Authentication** | ||
|
|
||
| - [ ] 6.1 Create `src/bookbytes/models/user.py` with `User` model: `id` (UUID, PK), `email` (unique), `hashed_password`, `is_active` (default True), timestamps | ||
| - [ ] 6.2 Create `src/bookbytes/repositories/user.py` with `UserRepository` extending base, adding `get_by_email()` method | ||
| - [ ] 6.3 Generate migration for User model: `alembic revision --autogenerate -m "add_user_model"` | ||
| - [ ] 6.4 Create `src/bookbytes/core/security.py` with password utilities: `hash_password(password: str) -> str` using passlib bcrypt, `verify_password(plain: str, hashed: str) -> bool` | ||
| - [ ] 6.2 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation | ||
| - [ ] 6.3 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access") | ||
| - [ ] 6.4 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")` | ||
| - [ ] 6.5 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user) | ||
| - [ ] 6.6 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid | ||
| - [ ] 6.7 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth | ||
| - [ ] 6.8 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user | ||
| - [ ] 6.9 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration | ||
| - [ ] 6.10 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py` | ||
| - [ ] 6.11 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id` | ||
| - [ ] 6.12 Include auth router in v1 router with `/auth` prefix | ||
| - [ ] 6.13 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token | ||
| - [ ] 6.14 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix task numbering duplication in Phase 6.
Tasks 6.2–6.14 have conflicting/duplicate numbering. After 6.3 (line 208), the next task reverts to 6.2 instead of continuing to 6.4. This cascades through the entire authentication section, confusing task tracking and progress.
Apply this diff to correct the numbering sequence:
- [ ] 6.1 Create `src/bookbytes/models/user.py` with `User` model: `id` (UUID, PK), `email` (unique), `hashed_password`, `is_active` (default True), timestamps
- [ ] 6.2 Create `src/bookbytes/repositories/user.py` with `UserRepository` extending base, adding `get_by_email()` method
- [ ] 6.3 Generate migration for User model: `alembic revision --autogenerate -m "add_user_model"`
- - [ ] 6.2 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation
- - [ ] 6.3 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access")
- - [ ] 6.4 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")`
- - [ ] 6.5 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user)
- - [ ] 6.6 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid
- - [ ] 6.7 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth
- - [ ] 6.8 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user
- - [ ] 6.9 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration
- - [ ] 6.10 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py`
- - [ ] 6.11 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id`
- - [ ] 6.12 Include auth router in v1 router with `/auth` prefix
- - [ ] 6.13 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token
- - [ ] 6.14 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)
+ - [ ] 6.4 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation
+ - [ ] 6.5 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access")
+ - [ ] 6.6 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")`
+ - [ ] 6.7 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user)
+ - [ ] 6.8 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid
+ - [ ] 6.9 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth
+ - [ ] 6.10 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user
+ - [ ] 6.11 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration
+ - [ ] 6.12 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py`
+ - [ ] 6.13 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id`
+ - [ ] 6.14 Include auth router in v1 router with `/auth` prefix
+ - [ ] 6.15 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token
+ - [ ] 6.16 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)🤖 Prompt for AI Agents
In tasks/backend-productionisation/tasks-backend-productionisation.md around
lines 204 to 222, the checklist numbering in Phase 6 duplicates (items after 6.3
revert to 6.2 and cascade), which breaks task sequencing; renumber the items
sequentially so Phase 6 reads 6.0 header, then 6.1 through 6.14 in order
(increment each duplicated 6.x entry following 6.3 to 6.4, 6.5, etc.) and ensure
cross-references (like endpoint numbers) match the new sequence.
…plement protocol-based services - Move background jobs and processing tasks from audio-books-library to new audio-books-pipeline module - Implement protocol-based abstraction for LLM and TTS services - Update design doc with protocol interfaces and provider implementations - Mark old service tasks as superseded in main productionisation tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
tasks/backend-productionisation/tasks-backend-productionisation.md (3)
250-250: Fix markdown emphasis formatting (duplicate of previous review feedback).Spaces inside emphasis markers violate markdown lint standards (MD037). This was flagged in the previous review and remains unfixed.
Apply this diff:
- - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)\_ + - [x] 7.3 Create logging middleware in `main.py`: Log request start (method, path, request*id), log request end (status_code, duration_ms), bind request_id to structlog context *(DONE - moved to auxiliary foundation)*
257-257: Fix markdown emphasis formatting (duplicate of previous review feedback).Spaces inside emphasis markers violate markdown lint standards (MD037). This was flagged in the previous review and remains unfixed.
Apply this diff:
- - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)\_ + - [x] 7.10 Create `docker/docker-compose.yml` with services: `api` (uvicorn), `worker` (arq), `postgres` (postgres:16-alpine), `redis` (redis:7-alpine) with health checks, volumes, and proper depends*on conditions *(DONE - moved to auxiliary foundation)*
232-245: Fix duplicate task numbering in Phase 6 (duplicate of previous review feedback).Task numbering reverts to 6.2 after 6.3, cascading through the entire authentication section. This creates ambiguity when referencing tasks and breaks progress tracking. This was flagged in the previous review and remains unresolved.
Apply this diff to renumber Phase 6 sequentially:
- [ ] 6.1 Create `src/bookbytes/models/user.py` with `User` model: `id` (UUID, PK), `email` (unique), `hashed_password`, `is_active` (default True), timestamps - [ ] 6.2 Create `src/bookbytes/repositories/user.py` with `UserRepository` extending base, adding `get_by_email()` method - [ ] 6.3 Generate migration for User model: `alembic revision --autogenerate -m "add_user_model"` - - [ ] 6.2 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation - - [ ] 6.3 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access") - - [ ] 6.4 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")` - - [ ] 6.5 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user) - - [ ] 6.6 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid - - [ ] 6.7 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth - - [ ] 6.8 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user - - [ ] 6.9 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration - - [ ] 6.10 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py` - - [ ] 6.11 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id` - - [ ] 6.12 Include auth router in v1 router with `/auth` prefix - - [ ] 6.13 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token - - [ ] 6.14 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401) + - [ ] 6.4 Add JWT utilities in `security.py`: `create_access_token(data: dict, expires_delta: timedelta | None) -> str` using python-jose, `decode_access_token(token: str) -> dict` with validation + - [ ] 6.5 Define JWT payload structure in `security.py`: `TokenPayload` dataclass with `sub` (user_id), `exp`, `iat`, `scope` (default: "access") + - [ ] 6.6 Create `src/bookbytes/schemas/auth.py` with schemas: `UserCreate(email: EmailStr, password: str)`, `UserLogin(email: EmailStr, password: str)`, `UserResponse(id, email, is_active, created_at)`, `TokenResponse(access_token: str, token_type: str = "bearer")` + - [ ] 6.7 Create `src/bookbytes/api/v1/auth.py` with endpoints: `POST /auth/register` (create user, return UserResponse), `POST /auth/login` (validate credentials, return TokenResponse), `GET /auth/me` (return current user) + - [ ] 6.8 Create auth dependency in `dependencies.py`: `get_current_user(token: str = Depends(oauth2_scheme))` that decodes JWT, fetches user from DB, raises 401 if invalid + - [ ] 6.9 Create optional auth dependency: `get_current_user_optional()` that returns None if no token, for endpoints that work with or without auth + - [ ] 6.10 Implement API key bypass for local dev: In `get_current_user()`, if `AUTH_MODE=api_key`, check `X-API-Key` header against `API_KEY` config, return a mock/system user + - [ ] 6.11 Add `oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")` in dependencies for Swagger UI integration + - [ ] 6.12 Protect book and job endpoints: Add `current_user: User = Depends(get_current_user)` to all endpoints in `books.py` and `jobs.py` + - [ ] 6.13 Filter jobs by user: Update `GET /jobs` to only return jobs where `job.user_id == current_user.id` + - [ ] 6.14 Include auth router in v1 router with `/auth` prefix + - [ ] 6.15 Test auth flow: Verify register → login → access protected endpoint with token works; verify 401 without token + - [ ] 6.16 Create `tests/integration/test_auth_api.py` with tests: register new user, login with valid credentials, login with invalid credentials (401), access /me with token, access /me without token (401)tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (4)
207-209: Specify OpenLibrary User-Agent configuration fully (duplicate of previous review feedback).Lines 207–209 mention the User-Agent header requirement but lack configuration/validation specification. Where is this value stored? How is it validated at runtime? Previous review flagged missing implementation details.
Add explicit configuration tasks:
- [ ] 3.C.2a Store User-Agent in `settings.OPENLIBRARY_USER_AGENT` (Pydantic Settings) - Format: `bookbytes/0.1 (+https://github.com/cryptus-neoxys/bookbytes; [email protected])` - Document in `.env.example` - [ ] 3.C.2b Validate User-Agent format in settings with regex or enum checks - [ ] 3.C.2c Add unit test verifying User-Agent header is included in OpenLibrary requestsThis ensures the configuration is discoverable and validated at startup.
305-315: Verify Phase 3.F (Background Jobs) is properly tracked in deferred module (duplicate of previous review feedback).Phase 3.F is deferred to
audio-books-pipeline/tasks-audio-books-pipeline.md(lines 305–315), which is necessary for audiobook processing. Background jobs are a critical dependency for the full productionization workflow. Verify that:
- The deferred module contains complete task coverage for job infrastructure, processing, and cache invalidation
- The link is bidirectional (audio-books-pipeline module references this file)
- PR merge is not blocked by incomplete job setup in the target module
Run this verification:
#!/bin/bash # Verify audio-books-pipeline module exists and has background jobs tasks echo "=== Checking audio-books-pipeline module ===" if [ -f "tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md" ]; then echo "✓ Module found" echo "" echo "=== Checking for Phase 3.F equivalents (job tasks) ===" grep -n "Job\|Background\|Worker\|Process" "tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md" | head -20 else echo "✗ Module not found" fiIf the deferred module is missing or incomplete, explicitly defer in a follow-up issue with timeline.
87-89: Resolve UUIDv7 verification before marking dependent model tasks complete (duplicate of previous review feedback).Task 3.0.4 is still marked incomplete
[ ]while all downstream model creation tasks (3.A.1–3.A.9, 3.A-R.1–3.A-R.6) are marked complete[x]. This creates risk: if UUIDv7 sorting verification fails, the entire model foundation requires rework. Verification must complete before committing to the v7 implementation.Complete 3.0.4 verification (test UUIDv7 sorting in PostgreSQL with native UUID type) and mark it complete before closing dependent model tasks. Run the verification steps:
- Insert sample UUIDv7 values with known timestamps
- Query
ORDER BY idand confirm chronological order- Document results and any configuration changes required
Update the checklist to mark 3.0.4 complete only after successful verification. If verification fails, revert dependent tasks to incomplete and plan rework.
318-347: Entire Testing phase (3.G) is incomplete—blocks production readiness claim (duplicate of previous review feedback).Phase 3.G (lines 318–347) is entirely unchecked
[ ]. For a PR titled "productionise," comprehensive test coverage is non-negotiable. Without tests, the "production-ready" claim lacks evidence:
- No cache service behavior verification (TTL jitter, stale-while-revalidate)
- No OpenLibrary integration tests (mocked HTTP)
- No library service workflow tests
- No end-to-end API flow tests
- No mock fixtures for external services
This was flagged as a critical blocker in the previous review and remains unresolved.
Either implement the listed tests in this PR or create a follow-up PR with explicit scope, timeline, and linked issue. Production code without test coverage creates compliance, reliability, and maintainability risk.
If deferring, add:
### Phase 3.G: Testing (Deferred to PR #XXX) [Create linked issue/PR with: - Tests for cache service (TTL, jitter, stale-while-revalidate) - Tests for OpenLibrary client (mocked HTTP, error handling) - Tests for library service workflows - Integration tests for search API endpoints - Mock fixtures under tests/mocks/openlibrary_responses.py - Timeline: [specify target date] ]
🧹 Nitpick comments (5)
tasks/backend-productionisation/audio-books-pipeline/design-doc.md (4)
55-75: Add language specification to code fence for ASCII diagram.The ASCII art diagram at line 55-74 is missing a language specification. While ASCII diagrams render without a language tag, adding one (e.g.,
```textor```plaintext) improves accessibility and satisfies linting requirements.Apply this diff to add the language specification:
-``` +```text ┌─────────────────────────────────────────────────────────────┐ │ ProcessingService │
113-173: Add language specification to Python code fence.The Python code block at lines 113-173 (LLMProvider protocol and related classes) is missing the language specification. Add
```pythonto enable syntax highlighting.Apply this diff:
-``` +```python from typing import Protocol from pydantic import BaseModel
177-208: Add language specification to Python code fence.The Python code block at lines 177-208 (TTSProvider protocol and TTSService) is missing the language specification. Add
```python.Apply this diff:
-``` +```python from typing import Protocol, AsyncIterator
224-233: Add language specification to Python code fence.The Python code block at lines 224-233 (TTSProvider abstract class example) is missing the language specification. Add
```python.Apply this diff:
-``` +```python # Protocol-based interface for TTS class TTSProvider(Protocol):tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md (1)
135-147: Add language specification to code fence for ASCII diagram.The ASCII architecture diagram at lines 135-147 is missing a language specification. Add
```textto comply with markdown linting and improve accessibility.Apply this diff:
-``` +```text ┌─────────────────────────────────────────────────────────────┐ │ ProcessingService │
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md(1 hunks)tasks/backend-productionisation/audio-books-pipeline/design-doc.md(1 hunks)tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md(1 hunks)tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md(1 hunks)tasks/backend-productionisation/tasks-backend-productionisation.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md
[style] ~46-~46: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...us}` immediately | | US2 | As a user, I want to poll job status to know when my audiobo...
(REP_WANT_TO_VB)
[style] ~47-~47: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... | | US3 | As a user, I want to listen to generated chapter summaries ...
(REP_WANT_TO_VB)
[style] ~48-~48: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...RL | | US4 | As a user, I want to refresh an audiobook when I want update...
(REP_WANT_TO_VB)
[style] ~55-~55: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...e | | DS2 | As a developer, I want to switch TTS providers without changing b...
(REP_WANT_TO_VB)
[style] ~57-~57: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... | | DS4 | As a developer, I want to test the pipeline without calling real ...
(REP_WANT_TO_VB)
tasks/backend-productionisation/tasks-backend-productionisation.md
[grammar] ~1-~1: Ensure spelling is correct
Context: # Tasks: Backend Productionisation PRD Reference: [prd-backend-production...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~106-~106: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ker/docker-compose.yml up` to start the full stack - Environment variables are validated a...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md
135-135: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
tasks/backend-productionisation/audio-books-pipeline/design-doc.md
9-9: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
🔇 Additional comments (3)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1)
1-212: Comprehensive task planning with clear phase structure. ✓The document effectively organizes the audio books pipeline productionization into six well-scoped phases, with actionable task lists, file inventories, and cross-references. The checkboxes and detailed sub-tasks provide a clear implementation roadmap.
tasks/backend-productionisation/audio-books-pipeline/design-doc.md (1)
1-315: Design document is comprehensive and well-architected. ✓The document effectively justifies technology choices (OpenAI TTS, Instructor LLM, Protocol abstraction), provides clear service interfaces with examples, and outlines a realistic implementation roadmap. The Protocol-based design enables provider flexibility without coupling business logic to specific libraries.
tasks/backend-productionisation/audio-books-pipeline/prd-audio-books-pipeline.md (1)
1-184: PRD is comprehensive with clear user stories and measurable success criteria. ✓The document effectively captures product goals, user stories (both API consumers and developers), and phased functional requirements. The non-goals section provides valuable scope clarification, and the open questions section raises important clarifications needed before implementation. Success metrics (>95% success rate, <5 min processing time) are realistic and measurable.
|
|
||
| --- | ||
|
|
||
| ### Phase 0: Storage Service (Prerequisite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix markdown heading hierarchy: h3 should be h2.
The document jumps from h1 (line 1) to h3 (line 9), skipping h2. Phase sections should use h2 (##) for consistency with markdown structure.
Apply this diff to fix the heading levels:
-### Phase 0: Storage Service (Prerequisite)
+## Phase 0: Storage Service (Prerequisite)Repeat this fix for all subsequent phase headings (lines 18, 24, 30, 35, 40, 107, etc.).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Phase 0: Storage Service (Prerequisite) | |
| ## Phase 0: Storage Service (Prerequisite) |
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md
around line 9, the heading "### Phase 0: Storage Service (Prerequisite)" is
using h3 while the document jumps from h1 to h3; change this and all phase
headings (e.g., lines 9, 18, 24, 30, 35, 40, 107, etc.) from h3 (###) to h2 (##)
so phases are consistent with the top-level h1 — update each heading line to
start with "##" instead of "###".
… endpoints and completed tests - Move endpoints 3.E.5 and 3.E.6 to audio-books-pipeline phase 1 - Mark test files 3.G.1 through 3.G.5 as completed - Update test descriptions for clarity
Implement initial processing API endpoints including:
- POST /books/process - Start audiobook processing
- POST /books/{audio_book_id}/refresh - Refresh audiobook
- GET /jobs/{job_id} - Check job status
Add request/response schemas and comprehensive unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md (3)
202-210: Document where the OpenLibrary User-Agent value lives and how it’s validated.3.C.2 now notes “Include app name and contact for API compliance”, but the task list still doesn’t say:
- Which config setting holds this value (e.g.,
settings.OPENLIBRARY_USER_AGENT),- The expected format (sample string), and
- That it’s covered by tests (e.g., httpx mock asserting the header).
Expanding 3.C.2 with those bullets will keep the design/tasks doc aligned with the implementation and the earlier review guidance.
342-353: Add explicit infrastructure/deployment tasks for the Redis/AOF configuration.The Notes section documents Redis behavior (AOF with
appendfsync everysec,allkeys-lru, 256mb limit), but there’s still no concrete checklist for wiring this into real deployments (Helm values, docker-compose, health checks, backup/restore strategy), as previously suggested.Consider adding a “Phase 3.H: Infrastructure & Deployment” section with items like:
### Phase 3.H: Infrastructure & Deployment - [ ] 3.H.1 Redis Helm values: allkeys-lru, 256mb, AOF appendfsync everysec - [ ] 3.H.2 Update docker-compose.yml (Redis + PostgreSQL) for local dev - [ ] 3.H.3 Add Redis health checks to Kubernetes manifests - [ ] 3.H.4 Document Redis AOF backup/recovery strategyso operational work is tracked alongside the library design.
87-127: UUIDv7 sorting verification (3.0.4) still lags behind completed model tasks.3.0.4 (“Verify UUIDv7 sorting works in PostgreSQL”) is still unchecked, but all downstream model/migration tasks in 3.A.1–3.A.9 are marked complete. If UUIDv7 ordering in native
uuidcolumns doesn’t behave exactly as expected, it could impact querying, pagination, and any “ORDER BY id” assumptions baked into the new models.Either:
- Complete and document the 3.0.4 verification (against the actual Postgres setup) and mark it
[x], or- If issues were found, revert any dependent tasks to
[ ]until the model/migration design is adjusted.tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1)
9-40: Fix markdown heading hierarchy for top-level Phases.The document jumps from
#to###for Phase headings (lines 9, 18, 24, 30, 35, 40), which violates the heading increment rule (MD001) and was flagged in the previous review.Consider promoting these overview phase headings to
##so the structure is:
#Tasks: Audio Books Pipeline##Phase 0/1/2/3/4/5 (overview)##Dependencies / Tasks
###Phase 0/1/... (within Tasks section)Example diff:
-### Phase 0: Storage Service (Prerequisite) +## Phase 0: Storage Service (Prerequisite) -### Phase 1: Processing API Endpoints +## Phase 1: Processing API Endpoints -### Phase 2: Job Infrastructure +## Phase 2: Job Infrastructure -### Phase 3: LLM Service +## Phase 3: LLM Service -### Phase 4: TTS Service +## Phase 4: TTS Service -### Phase 5: Processing Pipeline +## Phase 5: Processing Pipeline
🧹 Nitpick comments (1)
backend/src/bookbytes/schemas/processing.py (1)
88-119: Consider typingjob_typeandstatusas enums instead of bare strings.Right now
job_typeandstatusare plainstrfields with descriptive docstrings. Once the Job model and itsJobType/JobStatusenums exist, it would be safer to:
- Reuse those enums (or Pydantic
Literaltypes) here, and- Let the OpenAPI schema advertise the finite set of allowed values.
This avoids stringly‑typed status handling across API, services, and DB.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/src/bookbytes/api/v1/processing.py(1 hunks)backend/src/bookbytes/api/v1/router.py(1 hunks)backend/src/bookbytes/schemas/processing.py(1 hunks)backend/tests/unit/test_processing_endpoints.py(1 hunks)tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md(1 hunks)tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
backend/src/bookbytes/api/v1/processing.py (1)
backend/src/bookbytes/schemas/processing.py (4)
JobStatusResponse(88-137)ProcessRequest(16-56)ProcessResponse(68-85)RefreshRequest(59-65)
backend/tests/unit/test_processing_endpoints.py (1)
backend/src/bookbytes/schemas/processing.py (3)
JobStatusResponse(88-137)ProcessRequest(16-56)RefreshRequest(59-65)
backend/src/bookbytes/schemas/processing.py (1)
backend/src/bookbytes/models/base.py (3)
id(47-52)created_at(63-68)updated_at(71-77)
🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md
9-9: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
🔇 Additional comments (3)
backend/src/bookbytes/api/v1/router.py (1)
6-15: v1 router aggregation looks clean and consistent.The router composition is straightforward and correctly mounts the search and processing sub-routers under
/books; no issues from a routing or structure perspective.backend/src/bookbytes/schemas/processing.py (1)
16-56: ProcessRequest validation is clear and correctly constrained.The “exactly one of edition_id or isbn” rule is implemented cleanly via the
@model_validator(mode="after"), and the error messages are explicit enough for clients and tests to assert on. No changes needed here.backend/tests/unit/test_processing_endpoints.py (1)
22-176: Nice coverage of schema validation and current endpoint contracts.The tests thoroughly exercise
ProcessRequest/RefreshRequestvalidation (including message contents) and assert the current 501 + 422 behaviors on the endpoints, which will make it easy to evolve these handlers later without breaking callers unnoticed.
| # TODO: Implement in Phase 5 (ProcessingService) | ||
| # 1. Resolve edition_id or look up by ISBN | ||
| # 2. Check if audiobook already exists for this edition | ||
| # 3. Create AudioBook record (status=PENDING) | ||
| # 4. Create Job record | ||
| # 5. Enqueue ARQ task | ||
| # 6. Return job_id | ||
|
|
||
| # Placeholder - will be replaced with actual implementation | ||
| raise HTTPException( | ||
| status_code=status.HTTP_501_NOT_IMPLEMENTED, | ||
| detail="Processing service not yet implemented. See Phase 3.1.5 tasks.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align 501 error messages with current task numbering/docs.
The placeholder exceptions reference “Phase 3.1.5 tasks” and “Phase 3.1.2 tasks”, but the task docs now organize this work under different phase IDs (e.g., Phase 2: Job Infrastructure, Phase 5: Processing Pipeline in the audio-books-pipeline tasks file). That mismatch will be confusing when someone follows the pointers.
Consider updating the detail strings to reference the actual sections, e.g.:
- detail="Processing service not yet implemented. See Phase 3.1.5 tasks.",
+ detail=(
+ "Processing service not yet implemented. "
+ "See audio-books-pipeline Phase 5: Processing Pipeline tasks."
+ )
...
- detail="Refresh service not yet implemented. See Phase 3.1.5 tasks.",
+ detail=(
+ "Refresh service not yet implemented. "
+ "See audio-books-pipeline Phase 5: Processing Pipeline tasks."
+ )
...
- detail="Job repository not yet implemented. See Phase 3.1.2 tasks.",
+ detail=(
+ "Job repository not yet implemented. "
+ "See audio-books-pipeline Phase 2: Job Infrastructure tasks."
+ )Adjust the text to whatever final section names you settle on, but keep it in sync with the task docs.
Also applies to: 120-131, 163-170
| ### Phase 3.G: Testing | ||
|
|
||
| - [x] 3.G.1 Create `tests/unit/test_cache_service.py` | ||
|
|
||
| - Test cache get/set flow | ||
| - Test TTL jitter | ||
|
|
||
| - [x] 3.G.2 Create `tests/unit/test_openlibrary_service.py` | ||
|
|
||
| - Mock HTTP responses | ||
| - Test search, work details, ISBN collection | ||
|
|
||
| - [x] 3.G.3 Create `tests/unit/test_library_service.py` | ||
|
|
||
| - Test provider lookups | ||
| - Test work/edition storage | ||
|
|
||
| - [ ] 3.G.4 Create `tests/unit/test_repositories.py` <!-- TODO --> | ||
|
|
||
| - Test Work, Edition, BookProvider, AudioBook repositories | ||
|
|
||
| - [x] 3.G.5 Create `tests/integration/test_search_api.py` | ||
|
|
||
| - Test full search flow with mocked OpenLibrary | ||
| - Test caching behavior | ||
| - Test pagination | ||
|
|
||
| - [ ] 3.G.6 Add OpenLibrary mock responses to `tests/mocks/` <!-- TODO --> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten up remaining Testing tasks 3.G.4 and 3.G.6 for production readiness.
Most of Phase 3.G is now complete, but:
- 3.G.4 (
tests/unit/test_repositories.py) and - 3.G.6 (OpenLibrary mock responses under
tests/mocks/)
are still TODO. For a “productionise” track, it’d be good to either:
- Implement these in this or the next immediate PR, or
- Explicitly mark them as deferred with linked issue IDs so they don’t get lost.
🤖 Prompt for AI Agents
In
tasks/backend-productionisation/audio-books-library/tasks-audio-books-library.md
around lines 311–339, the checklist items 3.G.4 and 3.G.6 are still TODO; either
implement the missing unit tests and OpenLibrary mock fixtures or explicitly
defer them. If implementing: add tests/unit/test_repositories.py that exercises
Work, Edition, BookProvider, and AudioBook repositories (CRUD, relations, edge
cases, transactions and expected errors) and add corresponding deterministic
fixtures under tests/mocks/openlibrary/ (search responses, work detail
responses, ISBN collection entries) matching the shapes used by existing
unit/integration tests; ensure mocks are loaded by the test suite. If deferring:
create issue(s) in the repo, link their IDs in the checklist, assign an owner
and ETA, and update the markdown to mark 3.G.4 and 3.G.6 as deferred with those
issue links.
…sing Implement generic job tracking model and audio book job relation table to support background processing pipeline for audio book generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (7)
backend/src/bookbytes/models/job.py (1)
77-78: Consider adding a CheckConstraint for progress bounds.The
progressfield is documented as "Completion percentage (0-100)" but has no database-level constraint to enforce this range. Invalid values could be persisted.+from sqlalchemy import CheckConstraint, String -from sqlalchemy import String from sqlalchemy.orm import Mapped, mapped_column ... class Job(UUIDPrimaryKeyMixin, TimestampMixin, Base): ... + __table_args__ = ( + CheckConstraint("progress >= 0 AND progress <= 100", name="ck_jobs_progress_range"), + ) + # === Progress Tracking === progress: Mapped[int] = mapped_column(nullable=False, default=0)backend/src/bookbytes/models/audio_book_job.py (1)
51-53: Consider if eager loading both relationships is always needed.Both
jobandaudio_bookrelationships uselazy="joined", meaning they're always fetched via JOIN even when not accessed. If callers often need only one side, considerlazy="selectin"orlazy="raise"with explicitoptions(joinedload(...))in queries for better performance control.This is a minor optimization and acceptable if both are typically needed together.
backend/src/bookbytes/repositories/audio_book_job.py (1)
54-60: Redundantjoinedloadgiven model configuration.
AudioBookJob.audio_bookalready haslazy="joined"in the model definition, so the explicitjoinedloadhere is redundant. Either remove it here or change the model tolazy="select"and keep explicit loading in queries.query = ( select(AudioBookJob) .where(AudioBookJob.job_id == job_id) - .options(joinedload(AudioBookJob.audio_book)) )backend/src/bookbytes/repositories/job.py (4)
9-10: Prefertimezone.utcoverUTCfor broader Python-version compatibility
from datetime import UTCis only available from Python 3.11 onward; if you intend to support 3.10 or earlier, this import will fail. A more widely compatible option isdatetime.timezone.utc. Consider either explicitly pinning Python ≥3.11 or switching totimezone.utc:-from datetime import UTC, datetime +from datetime import datetime, timezone @@ - started_at=datetime.now(UTC), + started_at=datetime.now(timezone.utc), @@ - completed_at=datetime.now(UTC), + completed_at=datetime.now(timezone.utc), @@ - completed_at=datetime.now(UTC), + completed_at=datetime.now(timezone.utc),Please verify the target Python version in your runtime/container image and adjust accordingly.
Also applies to: 65-69, 120-123, 150-154
25-79: Tightenclaim_nexttyping and consider handling optimistic-lock missesTwo small points here:
The
job_typeparameter is annotated asstr | None, whileget_by_job_typetakesJobTypeand uses.value. For API consistency, consider either:
- Accepting
JobType | Nonehere and usingjob_type.value, or- Explicitly documenting that
claim_nextexpects the raw string value and updating the type hint toJobType | str | Noneif you want to support both.When the
UPDATEfails the optimistic-lock check (another worker claimed the job first), you returnNoneeven if other pending jobs exist. That’s correct but can cause extra “empty” polls under contention. If this becomes noticeable, you could loop a few times to try the next candidate before returning, or switch to a DB-level pattern (e.g.,SELECT ... FOR UPDATE SKIP LOCKED) where supported.
107-159: Optionally guard completion/failure transitions by current status/worker
mark_completedandmark_failedcurrently update purely byjob_id. If a stale worker or an out-of-band caller invokes these after the job has been retried, cancelled, or otherwise transitioned, the update will still succeed and overwrite the status/timestamps.If you want stronger invariants, consider adding guards such as:
where(Job.status == JobStatus.PROCESSING.value)and/orwhere(Job.worker_id == expected_worker_id)or aversioncheck similar toclaim_next.This would make state transitions more robust against stale calls and better align with the “optimistic locking for concurrency control” description in the class docstring.
195-253: Query patterns look good; ensure supporting DB indexes and optionally drop redundantlist(...)The status/job-type query helpers are straightforward and match expected usage patterns. Two minor suggestions:
For production scale, ensure the DB has appropriate composite indexes, e.g.:
(status, created_at)forget_by_status/get_pending_count.(job_type, created_at DESC)forget_by_job_type.
This will keep these queries cheap as the jobs table grows.
result.scalars().all()already returns a list; the outerlist(...)call is redundant and can be dropped for slightly cleaner code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py(1 hunks)backend/src/bookbytes/models/__init__.py(1 hunks)backend/src/bookbytes/models/audio_book_job.py(1 hunks)backend/src/bookbytes/models/job.py(1 hunks)backend/src/bookbytes/repositories/__init__.py(1 hunks)backend/src/bookbytes/repositories/audio_book_job.py(1 hunks)backend/src/bookbytes/repositories/job.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/src/bookbytes/repositories/init.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/src/bookbytes/repositories/audio_book_job.py (4)
backend/src/bookbytes/models/audio_book_job.py (1)
AudioBookJob(22-58)backend/src/bookbytes/models/job.py (1)
Job(37-114)backend/src/bookbytes/repositories/base.py (1)
BaseRepository(43-206)backend/src/bookbytes/schemas/common.py (1)
limit(117-119)
backend/src/bookbytes/models/job.py (1)
backend/src/bookbytes/models/base.py (3)
Base(24-30)UUIDPrimaryKeyMixin(33-52)id(47-52)
backend/src/bookbytes/models/audio_book_job.py (3)
backend/src/bookbytes/models/base.py (2)
Base(24-30)UUIDPrimaryKeyMixin(33-52)backend/src/bookbytes/models/audio_book.py (1)
AudioBook(37-79)backend/src/bookbytes/models/job.py (1)
Job(37-114)
🪛 Ruff (0.14.8)
backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py
8-8: Import from collections.abc instead: Sequence
Import from collections.abc
(UP035)
16-16: Use X | Y for type annotations
Convert to X | Y
(UP007)
17-17: Use X | Y for type annotations
Convert to X | Y
(UP007)
18-18: Use X | Y for type annotations
Convert to X | Y
(UP007)
🔇 Additional comments (8)
backend/src/bookbytes/models/job.py (3)
18-35: Well-structured enums for job status and type.Good use of
str, Enumpattern for JSON serialization compatibility and database storage as strings. The enums provide type safety while remaining database-friendly.
85-86: Theversionfield for optimistic locking is manual.The docstring mentions "Optimistic lock for concurrent access" but this requires manual implementation in the repository layer to check and increment the version on updates. SQLAlchemy's built-in
version_id_colprovides automatic optimistic locking withStaleDataErroron conflicts.If manual control is intentional (e.g., for custom retry logic), this is fine. Otherwise, consider using SQLAlchemy's native mechanism.
103-113: Clean property implementations.The
is_terminalandcan_retryproperties correctly check against enum values and provide clear business logic encapsulation.backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py (1)
59-68: Downgrade correctly reverses all changes.The downgrade properly drops indices before tables and handles the dependency order correctly (audio_book_jobs before jobs).
backend/src/bookbytes/models/audio_book_job.py (1)
22-58: Well-designed association model.Clean implementation of the 1:1 relationship between Job and AudioBook. The unique constraint on
job_idcorrectly enforces the cardinality, and CASCADE deletes ensure referential integrity.backend/src/bookbytes/repositories/audio_book_job.py (1)
62-106: Query implementations look correct.The join queries for fetching jobs by audiobook are well-structured with proper ordering and limits.
backend/src/bookbytes/models/__init__.py (1)
1-43: Clean and well-organized module exports.The
__all__is comprehensive, properly categorized by implementation phase, and includes helpful comments about design decisions (Redis-only caching) and future plans.backend/src/bookbytes/repositories/job.py (1)
81-105: Progress updates look solid and defensiveThe progress clamping (
0–100) and optionalcurrent_stepupdate are nice touches, and returning a boolean based onrowcountkeeps the API simple. No changes needed here from my side.
| sa.Column('started_at', sa.DateTime(), nullable=True), | ||
| sa.Column('completed_at', sa.DateTime(), nullable=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timezone inconsistency between timestamp columns.
started_at and completed_at use sa.DateTime() without timezone, while created_at and updated_at use sa.DateTime(timezone=True). This inconsistency can cause timezone-related bugs when comparing or displaying these timestamps together.
- sa.Column('started_at', sa.DateTime(), nullable=True),
- sa.Column('completed_at', sa.DateTime(), nullable=True),
+ sa.Column('started_at', sa.DateTime(timezone=True), nullable=True),
+ sa.Column('completed_at', sa.DateTime(timezone=True), nullable=True),Also update the model in backend/src/bookbytes/models/job.py to use timezone-aware types:
from sqlalchemy import DateTime
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)🤖 Prompt for AI Agents
In backend/alembic/versions/2232afba1a25_add_jobs_and_audio_book_jobs_tables.py
around lines 35-36, the started_at and completed_at columns are created as naive
DateTime() while created_at/updated_at use timezone-aware
DateTime(timezone=True); update these two Column definitions to use
sa.DateTime(timezone=True) to ensure consistency, and also update
backend/src/bookbytes/models/job.py to declare started_at and completed_at as
timezone-aware mapped_columns (DateTime(timezone=True)) so the ORM model and
migration match.
| async def create_link( | ||
| self, | ||
| job_id: UUID, | ||
| audio_book_id: UUID, | ||
| ) -> AudioBookJob: | ||
| """Create a link between a job and an audiobook. | ||
|
|
||
| Args: | ||
| job_id: The job's UUID | ||
| audio_book_id: The audiobook's UUID | ||
|
|
||
| Returns: | ||
| The created AudioBookJob link | ||
| """ | ||
| link = AudioBookJob(job_id=job_id, audio_book_id=audio_book_id) | ||
| self.session.add(link) | ||
| await self.session.commit() | ||
| await self.session.refresh(link) | ||
| return link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using commit() breaks transaction boundaries.
create_link calls session.commit() directly, unlike BaseRepository.create which uses flush(). This commits any pending changes from the caller's transaction, preventing atomic operations across multiple repository calls.
async def create_link(
self,
job_id: UUID,
audio_book_id: UUID,
) -> AudioBookJob:
...
link = AudioBookJob(job_id=job_id, audio_book_id=audio_book_id)
self.session.add(link)
- await self.session.commit()
+ await self.session.flush()
await self.session.refresh(link)
return linkThe caller should control when to commit the transaction.
🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/audio_book_job.py around lines 25 to 43,
the method calls self.session.commit() which breaks caller transaction
boundaries; change it to add the new AudioBookJob to the session, call await
self.session.flush() (not commit) to ensure the instance gets a database
identity, then await self.session.refresh(link) and return the link so the
caller retains control over committing the transaction.
| async def schedule_retry(self, job_id: UUID) -> bool: | ||
| """Schedule a failed job for retry. | ||
|
|
||
| Increments retry_count and sets status back to pending. | ||
|
|
||
| Args: | ||
| job_id: The job's UUID | ||
|
|
||
| Returns: | ||
| True if retry scheduled, False if max retries exceeded | ||
| """ | ||
| # Get current job state | ||
| job = await self.get_by_id(job_id) | ||
| if not job or not job.can_retry: | ||
| return False | ||
|
|
||
| stmt = ( | ||
| update(Job) | ||
| .where(Job.id == job_id) | ||
| .values( | ||
| status=JobStatus.PENDING.value, | ||
| retry_count=job.retry_count + 1, | ||
| worker_id=None, | ||
| started_at=None, | ||
| completed_at=None, | ||
| error_message=None, | ||
| error_code=None, | ||
| ) | ||
| ) | ||
| await self.session.execute(stmt) | ||
| await self.session.commit() | ||
|
|
||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset progress when scheduling a retry (and consider optimistic locking)
Semantic nit that can affect monitoring/UX:
schedule_retryresets status, worker, timestamps, and error fields, but leavesprogressunchanged. A job that failed at 80% will then bePENDINGwithprogress=80, which is misleading for dashboards and workers that treat progress as per-attempt. It’s usually cleaner to reset progress to 0 (orNone) on retry:
stmt = (
update(Job)
.where(Job.id == job_id)
.values(
status=JobStatus.PENDING.value,
retry_count=job.retry_count + 1,
worker_id=None,
started_at=None,
completed_at=None,
error_message=None,
error_code=None,
+ progress=0,
)
)- If you might have multiple schedulers or racey retry triggers, you may also want to include an optimistic-lock condition (e.g., on
versionorretry_count) to avoid double-increments and scheduling a job that was already retried.
🤖 Prompt for AI Agents
In backend/src/bookbytes/repositories/job.py around lines 161 to 193,
schedule_retry currently resets status, worker, timestamps, and error fields but
leaves progress unchanged and is vulnerable to racey double-retries; modify the
update to also reset progress (set to 0 or None per schema/UX choice) and add an
optimistic-lock condition (e.g., include a WHERE clause matching the current
retry_count or version) so the retry_count is only incremented when the record
is still the same — return False if the conditional update affected no rows,
otherwise commit and return True.
Implement repositories for job management and job-audiobook relations with worker-safe operations. Includes atomic job claiming, progress updates, and status transitions with optimistic locking.
127c524 to
e162290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md (1)
9-9: Align phase heading levels with the top‑level H1.The document jumps from the H1 on line 1 to H3 phase headings (e.g., line 9), which violates markdown heading‑increment rules and the style flagged by markdownlint. To keep a clean hierarchy, make each phase heading an H2:
-### Phase 0: Storage Service (Prerequisite) +## Phase 0: Storage Service (Prerequisite) -### Phase 1: Processing API Endpoints +## Phase 1: Processing API Endpoints -### Phase 2: Job Infrastructure +## Phase 2: Job Infrastructure -### Phase 3: LLM Service +## Phase 3: LLM Service -### Phase 4: TTS Service +## Phase 4: TTS Service -### Phase 5: Processing Pipeline +## Phase 5: Processing Pipeline -### Phase 6: Dependency Injection Setup +## Phase 6: Dependency Injection SetupApply the same change to all phase headings so the structure is
#(title) →##(phases).Also applies to: 18-18, 24-24, 30-30, 35-35, 40-40, 107-107, 132-132, 162-162, 183-183, 216-216
🧹 Nitpick comments (1)
backend/src/bookbytes/repositories/__init__.py (1)
6-13: VerifySoftDeleteRepositoryimport source; consider importing from its own module.The exported surface in
__all__looks coherent, butSoftDeleteRepositoryis imported frombookbytes.repositories.basewhile the contextual snippets indicate it’s defined insoft_delete_repository.py. Ifbase.pydoes not re-exportSoftDeleteRepository, this will raise an import error at package import time.If
SoftDeleteRepositorylives only insoft_delete_repository.py, prefer importing it directly from there:-from bookbytes.repositories.base import BaseRepository, SoftDeleteRepository +from bookbytes.repositories.base import BaseRepository +from bookbytes.repositories.soft_delete_repository import SoftDeleteRepositoryPlease double‑check which module actually defines
SoftDeleteRepositoryand adjust accordingly.Also applies to: 15-28
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/src/bookbytes/repositories/__init__.py(1 hunks)backend/src/bookbytes/repositories/audio_book_job.py(1 hunks)backend/src/bookbytes/repositories/job.py(1 hunks)tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/src/bookbytes/repositories/job.py
- backend/src/bookbytes/repositories/audio_book_job.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/src/bookbytes/repositories/__init__.py (8)
backend/src/bookbytes/repositories/audio_book.py (1)
AudioBookRepository(15-151)backend/src/bookbytes/repositories/audio_book_job.py (1)
AudioBookJobRepository(19-106)backend/src/bookbytes/repositories/base.py (2)
BaseRepository(43-206)SoftDeleteRepository(209-381)backend/src/bookbytes/repositories/book_provider.py (1)
BookProviderRepository(15-149)backend/src/bookbytes/repositories/chapter.py (1)
ChapterRepository(14-115)backend/src/bookbytes/repositories/edition.py (1)
EditionRepository(14-90)backend/src/bookbytes/repositories/job.py (1)
JobRepository(18-253)backend/src/bookbytes/repositories/work.py (1)
WorkRepository(13-75)
🪛 markdownlint-cli2 (0.18.1)
tasks/backend-productionisation/audio-books-pipeline/tasks-audio-books-pipeline.md
9-9: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backend/pyproject.toml (1)
10-10: Fix Python version mismatch between project requirement and tooling config (flagged in past review).The project explicitly requires Python >=3.13 (line 10) and classifies as Python 3.13 (line 22), but Ruff (line 100) and mypy (line 143) are configured for Python 3.11. This creates a validation gap where linters cannot properly check code written for Python 3.13 features.
Update tooling to match:
-target-version = "py311" +target-version = "py313" -python_version = "3.11" +python_version = "3.13"Also applies to: 100-100, 143-143
🧹 Nitpick comments (1)
backend/pyproject.toml (1)
26-73: Consider pinning production dependency versions for reproducibility.While most dependencies lack version constraints (allowing flexibility), production deployments typically benefit from pinned versions to ensure consistency across environments. Consider establishing a versioning strategy—either pinning all transitive dependencies via a lock file (e.g.,
uv.lockorpoetry.lock), or at least pinning major versions of critical dependencies likesqlalchemy,fastapi,redis, andopenai.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
backend/pyproject.toml(1 hunks)
🔇 Additional comments (2)
backend/pyproject.toml (2)
26-73: Comprehensive dependency set for production-ready FastAPI backend.The dependency organization is well-structured with clear categories and includes appropriate libraries for async database operations, background jobs, observability, and external integrations. Good use of modern async patterns (asyncpg, aioboto3, httpx) alongside structured logging and resilience patterns (tenacity).
150-156: Verify mypy overrides for untyped dependencies.Three modules are configured to ignore missing imports:
gtts,arq, andaioboto3. While this is appropriate for untyped libraries, confirm that these are the only modules without type stubs and that the override scope remains minimal to preserve strict type checking elsewhere.
Summary by CodeRabbit
New Features
Performance
Data & Background
Infrastructure & DevOps
Testing
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.