|
| 1 | +# Session Handover — Extralit OSS Audit & Issue Filing |
| 2 | + |
| 3 | +**Date:** 2026-05-10 |
| 4 | +**Branch at start:** `sparshr04/github-copilot-auth-ui` |
| 5 | +**Outcome:** 11 GitHub issues filed (#203–#213) covering security, CI, docs, backend/frontend architecture, and a new schema-agnostic data model. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## What we worked on and what got done |
| 10 | + |
| 11 | +1. **Full-repo OSS audit** across four areas, run as parallel Explore agents: |
| 12 | + - Documentation (root + per-component) |
| 13 | + - Backend (`extralit-server/`) |
| 14 | + - Frontend (`extralit-frontend/`) |
| 15 | + - CI/CD (`.github/workflows/`, Docker, Tilt, compose) |
| 16 | + |
| 17 | +2. **Deeper structural audit** focused on architecture, organization, and cross-cutting issues (layering, god modules, coexisting patterns, monorepo coordination, type drift). |
| 18 | + |
| 19 | +3. **Architecture redesign for the frontend**, with Vuex+Pinia preserved: |
| 20 | + - Drafted a feature-sliced layout (`src/features/<x>/{api,store,types,components,composables}.ts`) to replace the heavy DDD pattern in `/v1`. |
| 21 | + - Strangler migration policy: legacy `/v1` and `/store` frozen; new work goes in `features/`. |
| 22 | + |
| 23 | +4. **Schema-agnostic data-model design** (architecture only, not the schema language): |
| 24 | + - One `payloads` table with JSONB `data`, immutable versioned `schemas` resource. |
| 25 | + - `SchemaEngine` interface for pluggable validation. |
| 26 | + - Validation runs once at the write boundary; queryable fields via projections. |
| 27 | + |
| 28 | +5. **Distillation + filing**: deduped all findings into 27 candidate issues presented as a checkbox list. User selected 11. All filed against `Extralit/extralit` as #203–#213. |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## What worked and what didn't |
| 33 | + |
| 34 | +### Worked |
| 35 | +- **Parallel Explore agents** for the four audit areas — produced ~2000 words of concrete findings with file:line references in one round. |
| 36 | +- **User-driven prioritization** via the checkbox interview pattern — let the user pick from a deduped list rather than receiving a top-down recommendation. |
| 37 | +- **Direct `gh issue create`** in parallel once the label mapping was known — all 11 issues filed in one batch. |
| 38 | + |
| 39 | +### Didn't work / got fixed |
| 40 | +- **`gh` permission prompts timed out twice** before the first issue could be filed. Fix: user retried; once approved, subsequent calls in the same session were auto-allowed. |
| 41 | +- **Initial `gh issue create` failed**: used invented labels (`ci`, `docker`, `bug`, `architecture`, `tooling`, `lint`, `cleanup`, `repo`) that don't exist in the repo. |
| 42 | + - Fix: ran `gh label list` once, then mapped requested labels to the existing set: `infrastructure`, `deployment`, `refactor`, `documentation`, `good first issue`, `backend`, `frontend`, `epic`. |
| 43 | +- **Audit produced surface-level findings on first pass** (security/bugs only). Fix: user pushed for "structural, organizational, architectural" — second pass with re-scoped agents surfaced the high-leverage items (DDD overhead, type duplication, layering violations). |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +## Key decisions made and why |
| 48 | + |
| 49 | +| Decision | Why | |
| 50 | +|---|---| |
| 51 | +| **Keep both Vuex and Pinia** | User constraint. Avoids big-bang migration risk; new code uses Pinia, legacy untouched. | |
| 52 | +| **Replace DDD with feature-sliced** | DDD imposes 4–6 file hops per call (DI registry → repo → use-case → view-model → component). Optimizes for human + agent ergonomics: one folder = one feature = everything you need. | |
| 53 | +| **Strangler migration, not rewrite** | `/v1` is too large to port at once. Rule: only migrate when feature is already being touched substantively. | |
| 54 | +| **Schema-agnostic via JSONB + versioned schemas** | Avoids per-shape table migrations as users define new extraction schemas. Architecture is opaque to schema language — JSON Schema today, swappable. | |
| 55 | +| **Validation at boundary only** | Trust internal data after write. Avoids defensive re-validation in handlers. Tradeoff: must be religious about the boundary, since Postgres can't enforce JSONB shape. | |
| 56 | +| **File 11 issues, not all 27** | User curated. Higher signal-to-noise; everything filed has a real owner intent behind it. | |
| 57 | +| **Map invented labels to existing ones** | Avoid label sprawl. Repo already has a small, coherent label set; respect it. | |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## Lessons learned and gotchas |
| 62 | + |
| 63 | +- **Always run `gh label list` before `gh issue create --label …`** — invented labels fail the whole call (no partial creation). |
| 64 | +- **First-pass audits skew toward security/bugs.** To surface architectural issues, the prompt has to explicitly exclude surface-level findings and name the structural categories you want. |
| 65 | +- **`gh` permission prompts can time out** in this environment. If the first call times out, the user has to actively click approve; subsequent calls in the same session reuse the grant. |
| 66 | +- **Heavy DDD in a JS/TS codebase is expensive for LLMs**: the `useResolve()` indirection means an agent reading a component must traverse DI registrations to find the implementation. Plain imports + colocation drastically reduce cognitive load. |
| 67 | +- **Three-way type drift (server / SDK / frontend) is the single biggest invisible risk.** Each side hand-codes the same types. Worth its own issue (filed concept lives inside #209/#210 follow-ups; not a standalone issue yet). |
| 68 | +- **Argilla→Extralit rebrand is half-done in frontend only.** SDK and server are clean. Filed as #211. |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## Clear next steps |
| 73 | + |
| 74 | +### Immediate (security — not filed but flagged in audit) |
| 75 | +1. **Rotate compromised secrets**: `extralit-server/.env.dev` (JWT key, MinIO creds), `docker-compose.yaml` Postgres/MinIO passwords, `constants.py` `DEFAULT_PASSWORD`/`DEFAULT_API_KEY`. Treat as leaked. |
| 76 | +2. **Remove wildcard CORS** in `api/handlers/v1/files.py:62,135`. |
| 77 | +3. **Sanitize `v-html`** sites in frontend (5+ components). |
| 78 | +4. Consider filing these as issues — they were in the deduped list as **S1, S2, S3** but not selected. |
| 79 | + |
| 80 | +### Filed (just need owners) |
| 81 | +- **Quick wins first** (low risk, high signal): #203 (uv cache), #204 (docs), #210 (uv workspaces), #211 (rebrand sweep), #212 (untrack editor dirs), #213 (bd decision). |
| 82 | +- **Architecture sequence**: #206 (ARCHITECTURE.md) → #207 (ESLint rules) → #208 (reference feature). These three should land in order. |
| 83 | +- **Backend cleanup**: #205 (route handlers through `contexts/`). |
| 84 | +- **Big new capability**: #209 (schemas + payloads tables). Foundation only; downstream issues will be needed for migration of existing entities, indexed projections, and frontend `<SchemaForm>`. |
| 85 | + |
| 86 | +### Not filed but worth considering |
| 87 | +- **OpenAPI codegen for frontend types** — kills the three-way type drift. Was R2 in the dedup list. |
| 88 | +- **SDK ↔ Server contract tests** — was R3. |
| 89 | +- **`continue-on-error` removal in CI** — was C1; concrete security hardening. |
| 90 | +- **Top-level `permissions:` on workflows** — was C2. |
| 91 | + |
| 92 | +--- |
| 93 | + |
| 94 | +## Map of important files |
| 95 | + |
| 96 | +### Audit context (read these to understand what was reviewed) |
| 97 | +- `extralit-server/src/extralit_server/api/handlers/v1/files.py` — wildcard CORS bug |
| 98 | +- `extralit-server/src/extralit_server/constants.py:24-26` — committed default credentials |
| 99 | +- `extralit-server/.env.dev` — committed secrets (rotate) |
| 100 | +- `extralit-server/src/extralit_server/models/database.py:38` — model→schema layering violation |
| 101 | +- `extralit-server/src/extralit_server/contexts/workflows.py` — 909 LOC god module |
| 102 | +- `extralit-server/src/extralit_server/search_engine/commons.py` — 1010 LOC god module |
| 103 | +- `extralit-frontend/v1/` — legacy DDD; frozen under new architecture |
| 104 | +- `extralit-frontend/components/.../RenderTable.vue` — 1207 LOC god component |
| 105 | +- `extralit-frontend/nuxt.config.ts:80` — `disableVuex: false` (both Vuex and Pinia loaded) |
| 106 | +- `extralit-frontend/tsconfig.json:11` — `strict: false` |
| 107 | +- `docker-compose.yaml` — hardcoded passwords, floating `latest` tags, no healthchecks |
| 108 | +- `.github/workflows/extralit.yml`, `extralit-server.yml` — `continue-on-error` on tests |
| 109 | +- `AGENTS.md` — Python version mismatch with pyproject.toml; references nonexistent per-component AGENTS.md |
| 110 | + |
| 111 | +### New files to be created (per filed issues) |
| 112 | +- `extralit-frontend/ARCHITECTURE.md` (#206) |
| 113 | +- `extralit-frontend/src/features/<reference-feature>/` (#208) |
| 114 | +- `extralit-frontend/src/features/<x>/{api,store,types}.ts` pattern (#206 documents, #208 exemplifies) |
| 115 | +- `extralit-server/src/extralit_server/.../schemas/` and `payloads/` modules (#209) |
| 116 | +- Root `pyproject.toml` with `[tool.uv.workspace]` (#210) |
| 117 | +- `examples/README.md` (#204) |
| 118 | +- Per-component `AGENTS.md` files or removed references (#204) |
| 119 | + |
| 120 | +### Generated this session |
| 121 | +- `HANDOVER.md` (this file) |
| 122 | + |
| 123 | +### Issue references |
| 124 | +- All filed issues: https://github.com/Extralit/extralit/issues/203 through /213 |
0 commit comments