Skip to content

fix: import list-marker stripping eats leading numbers in resume content#354

Open
Mubashirrrr wants to merge 1 commit into
JOYCEQL:mainfrom
Mubashirrrr:fix/import-list-marker-strip-eats-leading-numbers
Open

fix: import list-marker stripping eats leading numbers in resume content#354
Mubashirrrr wants to merge 1 commit into
JOYCEQL:mainfrom
Mubashirrrr:fix/import-list-marker-strip-eats-leading-numbers

Conversation

@Mubashirrrr

Copy link
Copy Markdown

Problem

When importing a resume via the AI / PDF path, list-style fields (experience details, project descriptions, education descriptions, skills) are corrupted whenever a line legitimately starts with a number. Examples of what the import produces today:

"3.5 GPA"                     -> "GPA"
"5 years of experience"       -> "years of experience"
"10x improvement in latency"  -> "x improvement in latency"
"1000+ users served"          -> "+ users served"
"2019 - present"              -> "present"

This is silent data loss: the user uploads a resume and the imported copy is missing leading numbers that are very common in resume bullets (GPAs, years of experience, metrics, dates).

Root cause

toStringArray() in src/app/app/dashboard/resumes/utils.ts splits a multi-line string into items and strips a leading list marker from each line with:

line.replace(/^[-*\d.)\s]+/, "")

\d, . and ) are inside the same character class with a + quantifier, so the regex greedily removes any run of leading digits / dots / parens — not just an actual list marker. Any line beginning with a number loses that number.

This runs on the real import path: handlePdfFileChange -> /api/resume-import -> createResumeFromAIResult -> toListHtml -> toStringArray (the AI is also prompted to return description/details as text, which toStringArray handles via its string branch).

Fix

Replace the over-greedy class with a regex that strips only genuine list markers:

const LIST_MARKER_PREFIX_REGEX = /^\s*(?:[-*]+\s*|\d+[.)]\s+)/;
  • bullets (-, *, , optionally repeated), or
  • an ordered marker (1., 2)) that is digits followed by ./) and whitespace.

Bare leading numbers are preserved, while "- item", "1. item" and "2) item" are still stripped exactly as before. The only intentional behavior change is that a no-space token like "1.Built" is now kept (it is ambiguous and far more likely real content than a marker).

Test

The repo had no test runner, so this PR adds a minimal vitest setup:

  • vitest devDependency + "test": "vitest run" script
  • vitest.config.ts reusing the existing vite-tsconfig-paths for @/ alias resolution (no other plugins, so it stays fast and isolated)
  • src/app/app/dashboard/resumes/utils.test.ts covering the regression plus guards that real markers are still stripped and the array/empty branches behave.
pnpm test
 ✓ src/app/app/dashboard/resumes/utils.test.ts (4 tests)
 Test Files  1 passed (1)
      Tests  4 passed (4)

Verified fail-before / pass-after: with the original regex the regression test fails with ["GPA", "years of experience with Python", "x improvement in latency", "+ users served", "present"]; with the fix all 4 tests pass. The 3 guard tests pass in both states, confirming marker-stripping is unchanged.

Repro

  1. Import a resume (PDF/AI) whose bullets start with numbers, e.g. 5 years of experience, 3.5 GPA, 10x faster.
  2. Open the imported resume — the leading numbers are gone.

Or directly: toStringArray("3.5 GPA\n5 years of experience") returns ["GPA", "years of experience"] before the fix and ["3.5 GPA", "5 years of experience"] after.

Note: pnpm-lock.yaml changes are purely additive (vitest and its transitive deps); no existing dependency versions were modified.

toStringArray() turns a multi-line string into list items and strips a
leading list marker from each line. The strip regex /^[-*•\d.)\s]+/ put
\d, "." and ")" in the same greedy character class, so it removed any run
of leading digits/dots/parens, not just an actual list marker.

This silently corrupted real resume content on AI/PDF import (the path
createResumeFromAIResult -> toListHtml -> toStringArray), e.g.
  "3.5 GPA"                         -> "GPA"
  "5 years of experience"          -> "years of experience"
  "10x improvement in latency"     -> "x improvement in latency"
  "1000+ users served"             -> "+ users served"
  "2019 - present"                 -> "present"

Replace it with /^\s*(?:[-*•]+\s*|\d+[.)]\s+)/, which strips only genuine
markers: bullets ("- * •") or ordered markers ("1." / "2)") followed by
whitespace. Bare leading numbers are preserved, while "- item",
"1. item" and "2) item" are still stripped as before.

Adds a vitest regression test (the repo had no test setup) plus a "test"
script and a minimal vitest config that reuses the existing
vite-tsconfig-paths for "@/" alias resolution.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant