Skip to content

Small code-health tidy-ups (batch) #51

@fmasi

Description

@fmasi

From the 2026-06-10 codebase-health review. Priority: LOW — batch these into adjacent work rather than as standalone tasks.

  • Empty src/query/__init__.py — export the public API (build_hybrid_searcher, fusion, rerankers, thread expansion) with __all__ so the subsystem has a visible contract.
  • src/eval/__init__.py — one-paragraph docstring stating eval/ is offline analysis/benchmarking, not imported by production code.
  • The three profile.py files (src/profile.py = CorpusProfile config · src/pipeline/profile.py = profiling stage · src/ingest/profile.py = percentile helpers) — roles are genuinely distinct; add one-line cross-referencing docstrings. Optional: rename src/ingest/profile.pychunk_sizing.py.
  • attachments: make plaintext._decode public (decode_text) since handlers/html.py imports it across module boundary; align HtmlHandler's import/error split with the other handlers' BINARY-vs-ERROR pattern.
  • src/query/thread_expand.py:61-82 — wrap the raw Qdrant scroll() in an injectable fetcher when next touched, for unit-testability.
  • Twin loops src/llm/thread_summaries.py:59-94src/llm/onboard_pass.py:30-65 — near-identical thread-walk orchestration differing only in caching, plus cross-module imports of _as_dict/_dkey/_tid private helpers. Either extract one loop with a persistence callback or promote the helpers to public names; decide when the thread-context prompt next changes.
  • src/config/settings.py — 7 scattered os.getenv() calls; fold into Unified config file (file < env/.env < CLI precedence) consolidating RAG_* + attachment + MCP settings #39 rather than fixing separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    code-healthTidiness / refactoring findings from codebase-health reviews

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions