Skip to content

stack5:feat: add task source expansion and conversion wiring#164

Open
VX1D wants to merge 9 commits intomichaelshimeles:mainfrom
VX1D:stack/pr5-task-sources-convert
Open

stack5:feat: add task source expansion and conversion wiring#164
VX1D wants to merge 9 commits intomichaelshimeles:mainfrom
VX1D:stack/pr5-task-sources-convert

Conversation

@VX1D
Copy link
Copy Markdown
Contributor

@VX1D VX1D commented Mar 3, 2026

PR5: task sources and convert

Summary

This PR expands task ingestion/conversion and wires it into CLI entrypoints. It introduces CSV conversion flow and hardens task source parsing and validation.

Why this PR exists

  • Teams need to convert task specs between YAML/Markdown/JSON/CSV.
  • Task source handlers needed stricter parsing and input validation.
  • CLI needed clean conversion mode routing.

What it adds

  • CLI conversion flow:
    • cli/src/cli/commands/convert.ts
    • cli/src/cli/args.ts (--convert-from, --convert-to)
    • cli/src/cli/commands/index.ts
    • cli/src/index.ts routing updates
  • Task source/converter layer updates:
    • cli/src/tasks/csv.ts
    • cli/src/tasks/converter.ts
    • cli/src/tasks/index.ts
    • cli/src/tasks/markdown.ts
    • cli/src/tasks/markdown-folder.ts
    • cli/src/tasks/yaml.ts
    • cli/src/tasks/github.ts

PR preview

  • New conversion mode from CLI:
    • ralphy --convert-from tasks.yaml --convert-to tasks.csv
    • ralphy --convert-from PRD.md (auto-defaults output to .csv)
  • Task source dispatcher now includes CSV as a first-class source type.
  • Markdown-folder task IDs are validated more strictly (file + line constraints).
  • YAML and GitHub task handlers reject malformed input earlier.

Concrete scenarios this fixes

  • Invalid markdown-folder task ID like ../x.md:abc: now rejected immediately.
  • YAML with malicious prototype keys: blocked during parse stage.
  • GitHub repo arg with extra path segments: rejected unless strict owner/repo.

Security/reliability hardening

  • Markdown-folder task ID parsing now validates file + line-number constraints.
  • YAML parsing upgraded with bounded prototype-pollution traversal checks.
  • GitHub source repo parsing now enforces strict owner/repo format.
  • CSV source integration aligned with task-source type contracts in the stack.

Validation

  • Included in cumulative stack verification.
  • Stack passes bun run check, bun tsc --noEmit, and bun test in order.

Note

Add task source expansion and conversion wiring by introducing a CSV task source and a cli.commands.runConvert CLI path that converts YAML/Markdown/JSON PRDs to CSV

Add a CSV-backed task source, a --convert-from/--convert-to CLI path that runs cli.commands.runConvert, and expand parallel execution to accept CSV PRDs. Harden task sources (GitHub, YAML, Markdown, markdown-folder) and sanitize/logging, add a transform pipeline, hooks, metrics, resource manager, and JSON/stream parsers. Install global cleanup and signal handling. Centralize structured logging with pluggable sinks.

📍Where to Start

Start with main in cli/src/index.ts, then review cli.commands.runConvert in cli/src/cli/commands/convert.ts and CSV source wiring in cli/src/tasks/index.ts and cli/src/tasks/csv.ts.

Macroscope summarized 813f447.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 3, 2026

@VX1D is attempting to deploy a commit to the Goshen Labs Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot
Copy link
Copy Markdown

dosubot bot commented Mar 3, 2026

Related Documentation

3 document(s) may need updating based on files changed in this PR:

Goshen Labs's Space

Multi-File PRD (Product Requirements Document) Support
View Suggested Changes
@@ -36,10 +36,33 @@
 3,Setup CI/CD,0,0,Configure GitHub Actions
 ```
 
-CSV PRD files can be created manually or generated from existing markdown, YAML, or JSON PRDs using the built-in conversion utility. The CLI provides `--convert-from` and `--convert-to` flags to convert PRD files between formats (YAML/MD/JSON → CSV). For example:
+CSV PRD files can be created manually or generated from existing markdown, YAML, or JSON PRDs using the built-in conversion utility. This conversion feature provides a convenient way to leverage CSV's token savings benefits when working with existing PRD files in other formats.
 
-```
-ralphy --convert-from PRD.md --convert-to tasks.csv
+### Converting Task Files
+
+Ralphy provides CLI flags to convert task files from supported formats to CSV:
+
+- `--convert-from <file>`: Specifies the source file to convert
+- `--convert-to <file>`: Specifies the output file path (optional)
+
+If `--convert-to` is omitted, the output defaults to a `.csv` file with the same name as the input file.
+
+Supported input formats include:
+- `.yaml`, `.yml` (YAML task files)
+- `.md`, `.markdown` (Markdown task files)
+- `.json` (JSON task files)
+
+**Examples:**
+
+```bash
+# Convert markdown to CSV (auto-generates PRD.csv)
+ralphy --convert-from PRD.md
+
+# Convert YAML to CSV with explicit output path
+ralphy --convert-from tasks.yaml --convert-to tasks.csv
+
+# Convert JSON to CSV
+ralphy --convert-from backlog.json --convert-to backlog.csv
 ```
 
 This approach allows for efficient, structured workflows and easy integration with external systems. Each task from a CSV PRD file is assigned a unique identifier based on its row, ensuring accurate tracking and synchronization. CSV PRD support is recommended for large projects or when minimizing token usage is important.

[Accept] [Decline]

Multi-File PRD Support
View Suggested Changes
@@ -30,9 +30,28 @@
 
 Each task is assigned a unique ID that combines the source filename (basename) and the line number where the task appears, in the format `filename.md:lineNumber`. This ensures that every task can be traced back to its exact location in the original markdown file, even when tasks are spread across multiple files ([source](https://github.com/michaelshimeles/ralphy/blob/fc2df589969b5fe16d31eccb4e7ff91314e31776/cli/src/tasks/markdown-folder.ts#L6-L106)).
 
-**Example task IDs:**
+### Task ID Validation
+
+Task IDs are validated to ensure they reference valid markdown files within the PRD folder. The validation enforces the following constraints:
+
+- **Filename validation**: Task IDs must contain only valid filenames (basename validation) without directory separators
+- **File extension**: Filenames must end with the `.md` extension
+- **Line numbers**: Line numbers must be finite, positive integers (greater than zero)
+- **Path traversal protection**: Task IDs are validated to ensure they don't escape the PRD folder
+
+**Security Note**: The system validates task IDs to prevent path traversal attacks and ensures they reference valid markdown files within the PRD folder.
+
+**Valid task IDs:**
 - `backend.md:3`
 - `frontend.md:7`
+- `feature.md:42`
+
+**Invalid task IDs:**
+- `../outside.md:1` — path traversal attempt
+- `subdir/task.md:5` — contains directory separator
+- `task.txt:5` — wrong file extension
+- `doc.md:abc` — invalid line number (not a number)
+- `doc.md:0` — invalid line number (not positive)
 
 ## Sorting Behavior for Consistent Task Ordering
 

[Accept] [Decline]

PRD Folder Support
View Suggested Changes
@@ -1,4 +1,4 @@
-Ralphy supports using a folder of markdown files as your Product Requirements Document (PRD) source, enabling scalable management of requirements and tasks for large projects. With recent performance improvements, ralphy efficiently aggregates, tracks, and updates tasks across multiple files with minimal overhead and enhanced visibility.
+Ralphy supports using a folder of markdown files as your Product Requirements Document (PRD) source, enabling scalable management of requirements and tasks for large projects.
 
 ## Using a PRD Folder
 
@@ -16,15 +16,25 @@
 
 When marking a task as complete, ralphy parses the task ID to locate the correct file and line, updates the line by replacing `- [ ]` with `- [x]`, and writes the change back to the file. This ensures updates are always applied to the correct source, preserving the integrity of your PRD documentation.
 
-## Performance Optimizations
+## Task ID Validation and Security
 
-Ralphy now includes several enhancements for handling large PRD folders and high task volumes:
+Task IDs are strictly validated to ensure security and correctness when updating tasks across multiple markdown files:
 
-- **File Content Caching:** All markdown files in the PRD folder are cached, including parsed tasks and completion counts. This avoids redundant file reads and speeds up repeated operations such as listing tasks or counting remaining/completed items.
-- **Automatic Cache Invalidation:** If any file is modified externally or by ralphy, the cache is invalidated and refreshed to ensure accuracy.
-- **Batched Progress Logging:** Progress updates are written asynchronously and batched within short windows, reducing filesystem contention and improving responsiveness during parallel execution.
-- **Operation Timing Visibility:** Task execution steps and batch operations are tracked with timing breakdowns, allowing users to see which phases are slow and monitor overall progress.
-- **Consistent Task Ordering:** Files are always processed in sorted order, ensuring predictable aggregation and updates.
+- **Path Traversal Protection:** Task IDs are checked to ensure they reference files within the PRD folder. Attempts to access parent directories or arbitrary filesystem locations are rejected.
+- **Filename Validation:** Task IDs must reference files with `.md` extension and cannot contain path separators or relative path components.
+- **Line Number Validation:** Line numbers must be finite, positive integers.
+
+### Validation Examples
+
+Valid task IDs:
+- `feature.md:10`
+- `backend.md:25`
+
+Invalid task IDs:
+- `../secret.md:1` - path traversal attempt
+- `doc.txt:5` - wrong file extension
+- `task.md:abc` - invalid line number
+- `task.md:0` - line number must be positive
 
 ## Example Folder Structure
 
@@ -35,8 +45,8 @@
   infra.md        # - [ ] setup CI/CD
 ```
 
-Ralphy will aggregate all tasks from `backend.md`, `frontend.md`, and `infra.md`, sort them by filename, and track completion status per file and line. With the new performance features, even large, modular PRDs are managed efficiently, with accurate updates and clear progress visibility.
+Ralphy will aggregate all tasks from `backend.md`, `frontend.md`, and `infra.md`, sort them by filename, and track completion status per file and line.
 
 ## Summary
 
-These improvements make ralphy ideal for managing complex requirements across many markdown files. Users benefit from fast aggregation, reliable task tracking, and real-time feedback on execution and progress, even in large-scale projects.
+Ralphy's folder-based PRD support makes it ideal for managing complex requirements across many markdown files. The strict task ID validation ensures secure and reliable task tracking, while consistent file ordering guarantees predictable task aggregation and updates across large-scale projects.

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR wires a full task-format conversion pipeline (--convert-from / --convert-to) into the CLI and promotes CSV to a first-class TaskSource, alongside security hardening of the YAML, Markdown-folder, and GitHub sources and a significant refactor of the logging subsystem.

Key changes:

  • New runConvert command routes YAML / Markdown / JSON → CSV through dedicated converter functions, with prototype-pollution and parse-error guards throughout.
  • CsvTaskSource implements the full TaskSource contract (read, mark-complete, count) backed by a simple CSV parser.
  • MarkdownFolderTaskSource gains thorough path-traversal guards (basename check + resolve/sep confirmation) on task IDs.
  • YamlTaskSource adds a bounded BFS hasPrototypePollution check narrowed to __proto__ only.
  • GitHubTaskSource adds early token-format validation (including the ghs_ app-installation pattern), strict owner/repo parsing, and removes the TTL cache (doubling API calls for countRemaining — flagged in a previous review).
  • Logger is refactored into a sink-based pipeline (ConsoleLogSink, JsonFileLogSink, MultiLogSink, FilteredLogSink) with secret sanitization; the [OK] prefix is preserved.
  • New cleanup.ts introduces a cross-platform process registry with SIGTERM → SIGKILL escalation and dedup-guard.

Issues found:

  • CsvTaskSource.markComplete silently returns void when no CSV row matches the given ID, masking caller-side bugs; other TaskSource implementations throw on invalid IDs.
  • parseCSV does not strip the UTF-8 BOM (\uFEFF) present in files exported from Excel or LibreOffice, causing the first header cell to be "\uFEFFid" and silently breaking header-index lookups in markComplete.
  • parseSkillFile in converter.ts applies the frontmatter regex to un-normalized content; files with Windows line endings (\r\n) will fail the match and expose raw file content in the skills CSV.

Confidence Score: 2/5

  • Not safe to merge — two critical logic issues in the newly promoted CSV TaskSource can cause silent data corruption or task-completion tracking failure.
  • Two verified logic issues in cli/src/tasks/csv.ts are critical for the primary new feature: markComplete silently no-ops on unmatched IDs (leaving tasks perpetually queued with no error signal), and parseCSV doesn't strip UTF-8 BOM (breaking header detection for any CSV exported from Windows/Excel). Both are straightforward to fix but directly impact the newly first-class CSV source. One false-positive issue (outputFile derivation) was dropped after code verification. The line-ending normalization issue in parseSkillFile is also verified but less critical than the CSV issues.
  • cli/src/tasks/csv.ts (markComplete + parseCSV BOM), cli/src/tasks/converter.ts (parseSkillFile line-ending normalization)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI["CLI: index.ts\n--convert-from / --convert-to"] --> CONV_CMD["runConvert()\nconvert.ts"]
    CLI --> LOOP["runLoop()\nrun.ts"]

    CONV_CMD --> EXT{File extension?}
    EXT -- ".yaml/.yml" --> YTC["yamlToCsv()\nconverter.ts"]
    EXT -- ".md/.markdown" --> MTC["mdToCsv()\nconverter.ts"]
    EXT -- ".json" --> JTC["jsonToCsv()\nconverter.ts"]
    EXT -- "other" --> ERR["throw: Unsupported format"]

    YTC --> PP["hasPrototypePollution()\nyaml.ts"]
    PP -- "polluted" --> ERR2["throw: malicious keys"]
    PP -- "clean" --> ROWS["rowsToCsv()\ncsv.ts"]
    MTC --> ROWS
    JTC --> ROWS

    ROWS --> WRITE["writeFileSync(to, csv)"]

    LOOP --> SRC["createTaskSource()\ntasks/index.ts"]
    SRC -- "csv" --> CSV_SRC["CsvTaskSource\ncsv.ts"]
    SRC -- "yaml" --> YAML_SRC["YamlTaskSource\nyaml.ts"]
    SRC -- "markdown-folder" --> MF_SRC["MarkdownFolderTaskSource\nmarkdown-folder.ts"]
    SRC -- "github" --> GH_SRC["GitHubTaskSource\ngithub.ts"]

    LOOP --> PAR["runParallel()\nparallel.ts"]
    PAR --> COPY{prdSource?}
    COPY -- "csv/yaml/json/md" --> CP_FILE["copyFileSync PRD → worktree/sandbox"]
    COPY -- "markdown-folder" --> CP_DIR["cpSync PRD dir → worktree/sandbox"]
Loading

Last reviewed commit: 813f447

function hasPrototypePollution(obj: unknown): boolean {
const MAX_DEPTH = 20;
const MAX_NODES = 10000;
const dangerousKeys = new Set(["__proto__", "constructor", "prototype"]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dangerousKeys set includes "prototype" and "constructor", which are ordinary, valid JavaScript/YAML property names. A task file containing a field named prototype (e.g., from auto-generated YAML or a task titled "Refactor prototype layer") would be incorrectly rejected as malicious.

Only __proto__ is the actual attack vector in modern JS runtimes — constructor and prototype cannot cause prototype pollution through standard YAML.parse() output; they become regular object keys. Consider narrowing the guard:

Suggested change
const dangerousKeys = new Set(["__proto__", "constructor", "prototype"]);
const dangerousKeys = new Set(["__proto__"]);

If you choose to keep constructor and prototype for extra caution, at minimum update the error message to clarify which key triggered the rejection so users can diagnose false positives.

Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/yaml.ts
Line: 19

Comment:
The `dangerousKeys` set includes `"prototype"` and `"constructor"`, which are ordinary, valid JavaScript/YAML property names. A task file containing a field named `prototype` (e.g., from auto-generated YAML or a task titled "Refactor prototype layer") would be incorrectly rejected as malicious.

Only `__proto__` is the actual attack vector in modern JS runtimes — `constructor` and `prototype` cannot cause prototype pollution through standard `YAML.parse()` output; they become regular object keys. Consider narrowing the guard:

```suggestion
	const dangerousKeys = new Set(["__proto__"]);
```

If you choose to keep `constructor` and `prototype` for extra caution, at minimum update the error message to clarify which key triggered the rejection so users can diagnose false positives.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +199 to +204
skills.push({
name: parsed.name || entry.name,
desc: parsed.description,
content: parsed.content,
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium tasks/converter.ts:199

When a SKILL.md file inside a directory lacks frontmatter or fails parsing, parseSkillFile returns "SKILL" as the default name (line 169). In skillsToCsv, this truthy string overrides the intended fallback to entry.name (the directory name), so a skill in folder "python" is named "SKILL" instead of "python". Consider changing the fallback logic to prefer entry.name when parsed.name is empty or the placeholder "SKILL".

-                if (parsed) {
-                    skills.push({
-                        name: parsed.name || entry.name,
-                        desc: parsed.description,
-                        content: parsed.content,
-                    });
-                }
+                if (parsed) {
+                    const name = parsed.name && parsed.name !== "SKILL" ? parsed.name : entry.name;
+                    skills.push({
+                        name,
+                        desc: parsed.description,
+                        content: parsed.content,
+                    });
+                }
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/converter.ts around lines 199-204:

When a `SKILL.md` file inside a directory lacks frontmatter or fails parsing, `parseSkillFile` returns `"SKILL"` as the default name (line 169). In `skillsToCsv`, this truthy string overrides the intended fallback to `entry.name` (the directory name), so a skill in folder "python" is named "SKILL" instead of "python". Consider changing the fallback logic to prefer `entry.name` when `parsed.name` is empty or the placeholder "SKILL".

Evidence trail:
cli/src/tasks/converter.ts lines 166-172: parseSkillFile returns name extracted from filename ("SKILL" for SKILL.md) when no frontmatter exists.
cli/src/tasks/converter.ts line 199: skillsToCsv uses `parsed.name || entry.name`, which won't fall back to entry.name when parsed.name is the truthy string "SKILL".

function parseSkillFile(filePath: string): { name: string; description: string; content: string } | null {
try {
const content = readFileSync(filePath, "utf-8");
const frontmatterMatch = content.match(/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low tasks/converter.ts:155

The frontmatter regex ^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$ strictly requires a trailing newline after the closing --- fence. When a file contains only frontmatter with no body content, the match fails and the function falls back to treating the entire frontmatter block as body content, losing the metadata. Consider making the trailing newline optional by changing \n to \n? after the closing fence.

-		const frontmatterMatch = content.match(/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/);
+		const frontmatterMatch = content.match(/^---\s*\n([\s\S]*?)\n---\s*\n?([\s\S]*)$/);
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/converter.ts around line 155:

The frontmatter regex `^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$` strictly requires a trailing newline after the closing `---` fence. When a file contains only frontmatter with no body content, the match fails and the function falls back to treating the entire frontmatter block as body content, losing the metadata. Consider making the trailing newline optional by changing `\n` to `\n?` after the closing fence.

Evidence trail:
cli/src/tasks/converter.ts lines 155-171 at REVIEWED_COMMIT - The regex on line 155 is `^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$` which requires a trailing newline after the closing fence. The fallback on lines 166-171 loses frontmatter metadata when the regex doesn't match.

Comment on lines +31 to +47
nodesVisited++;
if (nodesVisited > MAX_NODES) {
throw new Error("YAML file too complex to validate safely");
}

if (current.depth > MAX_DEPTH) {
throw new Error("YAML file nesting exceeds safety limits");
}

if (typeof current.value !== "object" || current.value === null) {
continue;
}

if (visited.has(current.value)) {
continue;
}
visited.add(current.value);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium tasks/yaml.ts:31

nodesVisited counts every queue item including primitive values (strings, numbers, booleans), so a moderate YAML file with ~2,500 tasks each having 3-4 properties triggers the 10,000 limit and throws. Move the increment after the primitive check so only actual objects contribute to the node count.

-		nodesVisited++;
-		if (nodesVisited > MAX_NODES) {
-			throw new Error("YAML file too complex to validate safely");
-		}
-
 		if (current.depth > MAX_DEPTH) {
 			throw new Error("YAML file nesting exceeds safety limits");
 		}
 
 		if (typeof current.value !== "object" || current.value === null) {
 			continue;
 		}
+
+		if (++nodesVisited > MAX_NODES) {
+			throw new Error("YAML file too complex to validate safely");
+		}
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/yaml.ts around lines 31-47:

`nodesVisited` counts every queue item including primitive values (strings, numbers, booleans), so a moderate YAML file with ~2,500 tasks each having 3-4 properties triggers the 10,000 limit and throws. Move the increment after the primitive check so only actual objects contribute to the node count.

Evidence trail:
cli/src/tasks/yaml.ts lines 17-18: MAX_DEPTH = 20, MAX_NODES = 10000
cli/src/tasks/yaml.ts line 30-31: nodesVisited++ happens before the primitive check
cli/src/tasks/yaml.ts lines 39-41: primitive check (`typeof current.value !== "object" || current.value === null`) comes AFTER the increment
cli/src/tasks/yaml.ts lines 48-51: primitive values are pushed to queue when iterating object keys

const line = lines[i];

// Match incomplete tasks
const incompleteMatch = line.match(/^- \[ \] (.+)$/);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium tasks/markdown-folder.ts:97

Inconsistent handling of empty tasks: getAllTasks ignores - [ ] while countRemaining counts them, leading to mismatched totals. Suggest aligning the regexes so both treat empty descriptions the same (either include or exclude).

-			const incompleteMatch = line.match(/^- \[ \] (.+)$/);
+			const incompleteMatch = line.match(/^- \[ \] (.*)$/);
Also found in 1 other location(s)

cli/src/tasks/converter.ts:90

The runConvert function performs a division by zero check for originalTokens but not for the division logic itself in all cases. While line 67 handles originalTokens === 0, it does not handle the case where csvTokens is larger than originalTokens, which can happen if the CSV format adds overhead (e.g., lots of quoting/escaping) to a small file. This would result in a negative savings percentage, which is mathematically correct but potentially confusing if the intent is to show "savings". However, the more critical issue is that content.length is used to estimate originalTokens (Math.ceil(content.length / 4)). If content is empty string, originalTokens is 0. The ternary originalTokens === 0 ? 0 : ... handles the division by zero.

However, looking at the logic: writeFileSync is called before the logging. If writeFileSync fails (e.g. invalid path, permission denied), the program throws and the user sees no output. But if it succeeds, we log. The issue is in const savings = originalTokens === 0 ? 0 : Math.round((1 - csvTokens / originalTokens) * 100);. If originalTokens is 0 (empty file), savings is 0. If csvContent is not empty (e.g. just headers), csvTokens will be non-zero. The logic seems safe from crashing, but the calculation 1 - csvTokens / originalTokens implies that we expect CSV to be smaller. If CSV is larger, savings is negative.

Wait, re-reading the code in runConvert:
const taskCount = csvContent.split(&#34;\n&#34;).length - 1;
If csvContent is empty (which happens if yamlToCsv or others return empty string or just header), splitting by \n might result in [&#34;&#34;] (length 1) -> taskCount 0. If csvContent is just the header id,title..., split length is 1 (no newline at end) or 2 (newline at end).

Let's look at yamlToCsv. It returns toCSV(rows). toCSV joins by \n. If rows has just header, it returns one line. split(&#39;\n&#39;).length is 1. 1 - 1 = 0. Correct.
If csvContent has a trailing newline (often preferred in files), split returns an empty string at the end, so length is rows + 1. toCSV joins with \n, so no trailing newline is added by toCSV itself unless the last element implies it, but join puts it between elements. So toCSV result usually has NO trailing newline. writeFileSync writes exactly that.

However, yamlToCsv etc return just the joined string.
Example: 1 header, 0 tasks. rows = [header]. toCSV returns header_string. split(&#39;\n&#39;) returns [header_string]. Length 1. taskCount = 0. Correct.
Example: 1 header, 1 task. rows = [header, task]. toCSV returns header\ntask. split returns [header, task]. Length 2. taskCount = 1. Correct.

The real issue is potentially in mdToCsv loop logic: const incompleteMatch = line.match(/^- \[ \] (.+)$/);. The regex requires a space after [ ]. Standard markdown checklists are - [ ] text. If there are multiple spaces or tabs, it fails. But the bigger issue is in runConvert.

In runConvert, const content = readFileSync(from, &#34;utf-8&#34;); reads the whole file. If the file is large, this is fine for a CLI tool usually.

The issue is here:

49: 	} catch (err) {
50: 		const message = err instanceof Error ? err.message : String(err);
51: 		logError(`Error parsing ${from}: ${message}`);
52: 		throw err;
53: 	}

If an error occurs, it logs it AND throws it. If the caller (await main() in the usage example) doesn't catch it, it will print the unhandled rejection/error trace, which duplicates the error message. However, the CLI entry point usually handles this.

A more subtle bug: extname includes the dot.

26: 		switch (fromExt) {
27: 			case &#34;.yaml&#34;:

This is correct.

Let's check mdToCsv in cli/src/tasks/converter.ts.

84: 	const lines = mdContent.replace(/\r\n/g, &#34;\n&#34;).split(&#34;\n&#34;);
...
90: 		const incompleteMatch = line.match(/^- \[ \] (.+)$/);

If mdContent is empty, lines is [&#34;&#34;]. Loop runs once. No match. Returns header. Correct.
If mdContent has - [ ] task, the regex ^- expects start of string. It does not handle indentation. Markdown lists are often indented. The regex anchors to ^ immediately followed by -. This will miss any indented tasks.

Let's verify reachable path.
User provides a file tasks.md:

# My Tasks
  - [ ] Indented task
- [ ] Normal task

The code executes mdToCsv.
Line 84 splits content.
Line 88 loops.
Line 90 tests ^- \[ \] (.+)$.
" - [ ] Indented task" fails the regex because of leading spaces.
"- [ ] Normal task" passes.
Result: The indented task is silently dropped.
This is a bug in the parser logic for Markdown. The intent is clearly to parse tasks, and markdown tasks are frequently indented.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/markdown-folder.ts around line 97:

Inconsistent handling of empty tasks: `getAllTasks` ignores `- [ ] ` while `countRemaining` counts them, leading to mismatched totals. Suggest aligning the regexes so both treat empty descriptions the same (either include or exclude).

Evidence trail:
cli/src/tasks/markdown-folder.ts lines 91 and 114 at REVIEWED_COMMIT. Line 91 shows `getAllTasks` uses regex `/^- \[ \] (.+)$/` which requires at least one character after the checkbox (`.+`). Line 114 shows `countRemaining` uses regex `/^- \[ \] /gm` which only matches the prefix and would count empty-description tasks.

Also found in 1 other location(s):
- cli/src/tasks/converter.ts:90 -- The `runConvert` function performs a division by zero check for `originalTokens` but not for the division logic itself in all cases. While line 67 handles `originalTokens === 0`, it does not handle the case where `csvTokens` is larger than `originalTokens`, which can happen if the CSV format adds overhead (e.g., lots of quoting/escaping) to a small file. This would result in a negative savings percentage, which is mathematically correct but potentially confusing if the intent is to show "savings". However, the more critical issue is that `content.length` is used to estimate `originalTokens` (`Math.ceil(content.length / 4)`). If `content` is empty string, `originalTokens` is 0. The ternary `originalTokens === 0 ? 0 : ...` handles the division by zero. 

However, looking at the logic: `writeFileSync` is called *before* the logging. If `writeFileSync` fails (e.g. invalid path, permission denied), the program throws and the user sees no output. But if it succeeds, we log. The issue is in `const savings = originalTokens === 0 ? 0 : Math.round((1 - csvTokens / originalTokens) * 100);`. If `originalTokens` is 0 (empty file), `savings` is 0. If `csvContent` is not empty (e.g. just headers), `csvTokens` will be non-zero. The logic seems safe from crashing, but the calculation `1 - csvTokens / originalTokens` implies that we expect CSV to be smaller. If CSV is larger, savings is negative. 

Wait, re-reading the code in `runConvert`: 
`const taskCount = csvContent.split("\n").length - 1;`
If `csvContent` is empty (which happens if `yamlToCsv` or others return empty string or just header), splitting by `\n` might result in `[""]` (length 1) -> taskCount 0. If `csvContent` is just the header `id,title...`, split length is 1 (no newline at end) or 2 (newline at end). 

Let's look at `yamlToCsv`. It returns `toCSV(rows)`. `toCSV` joins by `\n`. If rows has just header, it returns one line. `split('\n').length` is 1. `1 - 1 = 0`. Correct.
If `csvContent` has a trailing newline (often preferred in files), `split` returns an empty string at the end, so length is `rows + 1`. `toCSV` joins with `\n`, so no trailing newline is added by `toCSV` itself unless the last element implies it, but `join` puts it between elements. So `toCSV` result usually has NO trailing newline. `writeFileSync` writes exactly that. 

However, `yamlToCsv` etc return just the joined string. 
Example: 1 header, 0 tasks. `rows` = `[header]`. `toCSV` returns `header_string`. `split('\n')` returns `[header_string]`. Length 1. `taskCount` = 0. Correct.
Example: 1 header, 1 task. `rows` = `[header, task]`. `toCSV` returns `header\ntask`. `split` returns `[header, task]`. Length 2. `taskCount` = 1. Correct.

The real issue is potentially in `mdToCsv` loop logic: `const incompleteMatch = line.match(/^- \[ \] (.+)$/);`. The regex requires a space after `[ ]`. Standard markdown checklists are `- [ ] text`. If there are multiple spaces or tabs, it fails. But the bigger issue is in `runConvert`. 

In `runConvert`, `const content = readFileSync(from, "utf-8");` reads the whole file. If the file is large, this is fine for a CLI tool usually. 

The issue is here:
[code fence]typescript
49: 	} catch (err) {
50: 		const message = err instanceof Error ? err.message : String(err);
51: 		logError(`Error parsing ${from}: ${message}`);
52: 		throw err;
53: 	}
[code fence]
If an error occurs, it logs it AND throws it. If the caller (`await main()` in the usage example) doesn't catch it, it will print the unhandled rejection/error trace, which duplicates the error message. However, the CLI entry point usually handles this. 

A more subtle bug: `extname` includes the dot. 
[code fence]typescript
26: 		switch (fromExt) {
27: 			case ".yaml":
[code fence]
This is correct.

Let's check `mdToCsv` in `cli/src/tasks/converter.ts`. 
[code fence]typescript
84: 	const lines = mdContent.replace(/\r\n/g, "\n").split("\n");
...
90: 		const incompleteMatch = line.match(/^- \[ \] (.+)$/);
[code fence]
If `mdContent` is empty, `lines` is `[""]`. Loop runs once. No match. Returns header. Correct.
If `mdContent` has ` - [ ] task`, the regex `^- ` expects start of string. It does not handle indentation. Markdown lists are often indented. The regex anchors to `^` immediately followed by `-`. This will miss any indented tasks.

Let's verify reachable path.
User provides a file `tasks.md`:
[code fence]markdown
# My Tasks
  - [ ] Indented task
- [ ] Normal task
[code fence]
The code executes `mdToCsv`. 
Line 84 splits content.
Line 88 loops.
Line 90 tests `^- \[ \] (.+)$`.
"  - [ ] Indented task" fails the regex because of leading spaces.
"- [ ] Normal task" passes.
Result: The indented task is silently dropped.
This is a bug in the parser logic for Markdown. The intent is clearly to parse tasks, and markdown tasks are frequently indented.

Comment on lines +41 to +46
function truncateDesc(desc: string | undefined, maxLen = 80): string {
if (!desc) return "";
const clean = desc.replace(/\s+/g, " ").trim();
if (clean.length <= maxLen) return clean;
return `${clean.slice(0, maxLen - 3)}...`;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium tasks/converter.ts:41

truncateDesc receives desc from parsed JSON/YAML without runtime validation, so a numeric description like {"description": 2023} passes the if (!desc) check (since 2023 is truthy) and then crashes on desc.replace with TypeError: desc.replace is not a function. Consider adding a type check: const str = typeof desc === 'string' ? desc : String(desc ?? '') before calling string methods.

 function truncateDesc(desc: string | undefined, maxLen = 80): string {
 	if (!desc) return "";
-	const clean = desc.replace(/\s+/g, " ").trim();
+	const str = typeof desc === "string" ? desc : String(desc ?? "");
+	const clean = str.replace(/\s+/g, " ").trim();
 	if (clean.length <= maxLen) return clean;
 	return `${clean.slice(0, maxLen - 3)}...`;
 }
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/converter.ts around lines 41-46:

`truncateDesc` receives `desc` from parsed JSON/YAML without runtime validation, so a numeric description like `{"description": 2023}` passes the `if (!desc)` check (since `2023` is truthy) and then crashes on `desc.replace` with `TypeError: desc.replace is not a function`. Consider adding a type check: `const str = typeof desc === 'string' ? desc : String(desc ?? '')` before calling string methods.

Evidence trail:
cli/src/tasks/converter.ts lines 41-45 (truncateDesc function definition), lines 48-54 (YamlTask interface with description?: string), lines 108-117 (JsonTask interface with description?: string), lines 63-77 (yamlToCsv using YAML.parse with type assertion), lines 121-142 (jsonToCsv using JSON.parse with type assertion). TypeScript type assertions like `as YamlTaskFile` are compile-time only and do not perform runtime validation.

Comment on lines +15 to +23
// Handle unhandled promise rejections globally
process.on("unhandledRejection", (reason, promise) => {
// BUG FIX: Preserve error stack traces for better debugging
const errorMessage =
reason instanceof Error ? `${reason.message}\n${reason.stack}` : String(reason);
logError(`Unhandled Promise Rejection: ${errorMessage}`);
logError(`Promise: ${promise}`);
// Don't crash, but log and continue - prevent uncaught exception
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium src/index.ts:15

The unhandledRejection handler swallows all promise rejections without terminating the process. When main() rejects due to a cleanup failure, the rejection is caught and logged, but the process exits with code 0 when the event loop drains — hiding the failure from CI/CD. Consider re-throwing or calling process.exit(1) after logging to ensure non-zero exit on unhandled rejections.

 process.on("unhandledRejection", (reason, promise) => {
 	// BUG FIX: Preserve error stack traces for better debugging
 	const errorMessage =
 		reason instanceof Error ? `${reason.message}\n${reason.stack}` : String(reason);
 	logError(`Unhandled Promise Rejection: ${errorMessage}`);
 	logError(`Promise: ${promise}`);
-	// Don't crash, but log and continue - prevent uncaught exception
+	process.exit(1);
 });
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/index.ts around lines 15-23:

The `unhandledRejection` handler swallows all promise rejections without terminating the process. When `main()` rejects due to a cleanup failure, the rejection is caught and logged, but the process exits with code 0 when the event loop drains — hiding the failure from CI/CD. Consider re-throwing or calling `process.exit(1)` after logging to ensure non-zero exit on unhandled rejections.

Evidence trail:
cli/src/index.ts lines 16-23: `unhandledRejection` handler logs but does not call `process.exit(1)` or re-throw. Lines 78-82: catch block calls `await runCleanup()` before `process.exit(1)` with no nested try/catch. Line 84: `await main()` at top level. If `runCleanup()` throws in the catch block, `process.exit(1)` is never reached, and the rejection goes to the handler that only logs.

Comment on lines +8 to +12
const GITHUB_TOKEN_PATTERNS = [
/^ghp_[A-Za-z0-9]{36}$/, // Classic PAT
/^github_pat_[A-Za-z0-9_]{82}$/, // Fine-grained PAT
/^gho_[A-Za-z0-9]{52}$/, // OAuth token
];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High tasks/github.ts:8

Token validation is too strict. validateGitHubToken rejects valid GITHUB_TOKENs (e.g., ghs_*) and uses the wrong gho_* length, causing CI failures. Suggest removing client-side format checks and relying on GitHub to reject invalid tokens, or broadening GITHUB_TOKEN_PATTERNS to include ghs_* and correcting the gho_* length.

-const GITHUB_TOKEN_PATTERNS = [
-	/^ghp_[A-Za-z0-9]{36}$/, // Classic PAT
-	/^github_pat_[A-Za-z0-9_]{82}$/, // Fine-grained PAT
-	/^gho_[A-Za-z0-9]{52}$/, // OAuth token
-];
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/github.ts around lines 8-12:

Token validation is too strict. `validateGitHubToken` rejects valid `GITHUB_TOKEN`s (e.g., `ghs_*`) and uses the wrong `gho_*` length, causing CI failures. Suggest removing client-side format checks and relying on GitHub to reject invalid tokens, or broadening `GITHUB_TOKEN_PATTERNS` to include `ghs_*` and correcting the `gho_*` length.

Evidence trail:
cli/src/tasks/github.ts lines 8-12 (GITHUB_TOKEN_PATTERNS definition), lines 17-19 (validateGitHubToken function), lines 47-51 (where validation is called and error thrown). GitHub documentation: https://developer.github.com/changes/2021-04-05-behind-githubs-new-authentication-token-formats confirms ghs_ prefix for server-to-server tokens. Gist https://gist.github.com/magnetikonline/073afe7909ffdd6f10ef06a00bc3bc88 confirms `^ghs_[a-zA-Z0-9]{36}$` format for GitHub Actions tokens.

Comment on lines +84 to +90
interface CsvTask {
id: string;
title: string;
completed: boolean;
parallelGroup: number;
description: string;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High tasks/csv.ts:84

writeFile reconstructs the CSV with a hardcoded schema (id, title, done, group, desc), discarding any additional columns present in the original file. When markComplete is called, extra metadata like assignee or priority is silently deleted. Consider preserving unknown columns by storing the original row data and passing it through on write.

interface CsvTask {
	id: string;
	_title: string;
	completed: boolean;
	parallelGroup: number;
	description: string;
	_extra?: string[];
}
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/csv.ts around lines 84-90:

`writeFile` reconstructs the CSV with a hardcoded schema (id, title, done, group, desc), discarding any additional columns present in the original file. When `markComplete` is called, extra metadata like `assignee` or `priority` is silently deleted. Consider preserving unknown columns by storing the original row data and passing it through on write.

Evidence trail:
cli/src/tasks/csv.ts lines 93-112 (readFile() only extracts 5 columns), lines 114-120 (writeFile() writes hardcoded 5-column schema), lines 134-141 (markComplete() flow: read -> modify -> write back, losing extra columns)

Comment on lines +50 to +54
case "csv":
if (!options.filePath) {
throw new Error("filePath is required for json task source");
throw new Error("filePath is required for csv task source");
}
return new JsonTaskSource(options.filePath);
return new CsvTaskSource(options.filePath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High tasks/index.ts:50

The createTaskSource function replaces the json case with csv instead of adding csv as a new case. When callers pass type: "json" (still defined in TaskSourceType and used by runLoop), the switch falls through to the default case and throws Unknown task source type. Keep the json case and add csv as a separate case.

-		case "csv":
+		case "json":
 			if (!options.filePath) {
-				throw new Error("filePath is required for csv task source");
+				throw new Error("filePath is required for json task source");
 			}
-			return new CsvTaskSource(options.filePath);
+			return new JsonTaskSource(options.filePath);
+
+		case "csv":
+			if (!options.filePath) {
+				throw new Error("filePath is required for csv task source");
+			}
+			return new CsvTaskSource(options.filePath);
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/index.ts around lines 50-54:

The `createTaskSource` function replaces the `json` case with `csv` instead of adding `csv` as a new case. When callers pass `type: "json"` (still defined in `TaskSourceType` and used by `runLoop`), the switch falls through to the default case and throws `Unknown task source type`. Keep the `json` case and add `csv` as a separate case.

Evidence trail:
1. `git_diff base=MERGE_BASE head=REVIEWED_COMMIT path=cli/src/tasks/index.ts` shows `case "json":` replaced with `case "csv":` (lines 47-53 of diff)
2. `cli/src/tasks/types.ts` line 20: `export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "json" | "github";` - json still in type
3. `cli/src/tasks/json.ts` - JsonTaskSource class still exists
4. `cli/src/tasks/index.ts` current code - switch has no `case "json":` handler, only `case "csv":`

Comment on lines +62 to +64
export function yamlToCsv(yamlContent: string): string {
const data = YAML.parse(yamlContent) as YamlTaskFile;
const tasks = data.tasks || [];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium tasks/converter.ts:62

yamlToCsv returns an empty CSV when the YAML contains a root-level array of tasks (e.g. - title: Foo), because it assumes data.tasks exists and never checks if data itself is the array. This is inconsistent with jsonToCsv, which handles both formats. Consider adding the same array check: const tasks = Array.isArray(data) ? data : (data.tasks || []).

-	const data = YAML.parse(yamlContent) as YamlTaskFile;
-	const tasks = data.tasks || [];
+	const data = YAML.parse(yamlContent) as YamlTaskFile | YamlTask[];
+	const tasks = Array.isArray(data) ? data : (data.tasks || []);
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file cli/src/tasks/converter.ts around lines 62-64:

`yamlToCsv` returns an empty CSV when the YAML contains a root-level array of tasks (e.g. `- title: Foo`), because it assumes `data.tasks` exists and never checks if `data` itself is the array. This is inconsistent with `jsonToCsv`, which handles both formats. Consider adding the same array check: `const tasks = Array.isArray(data) ? data : (data.tasks || [])`.

Evidence trail:
cli/src/tasks/converter.ts lines 61-63 (yamlToCsv): `const data = YAML.parse(yamlContent) as YamlTaskFile; const tasks = data.tasks || [];` - no Array.isArray check.

cli/src/tasks/converter.ts line 117 (jsonToCsv): `const tasks: JsonTask[] = Array.isArray(data) ? data : (data as { tasks?: JsonTask[] })?.tasks || [];` - correctly handles both array and object formats.

Comment on lines +52 to +57
function sanitizeCsvCell(value: string): string {
if (/^[=+\-@]/.test(value)) {
return `'${value}`;
}
return value;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizeCsvCell unconditionally prepends ' to any string starting with -, +, =, or @. While this prevents CSV formula injection, it silently corrupts legitimate task data:

  • A task titled "- Remove deprecated API" becomes "'- Remove deprecated API" in the output CSV, with the apostrophe stored as part of the title.
  • Numeric or numeric-like strings such as negative numbers (e.g., "-1", "+42") are unnecessarily mangled.

The check should skip sanitization for values that are already valid numbers, since numeric strings cannot contain formula injections:

Suggested change
function sanitizeCsvCell(value: string): string {
if (/^[=+\-@]/.test(value)) {
return `'${value}`;
}
return value;
}
function sanitizeCsvCell(value: string): string {
if (/^[=+\-@]/.test(value) && Number.isNaN(Number(value))) {
return `'${value}`;
}
return value;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/csv.ts
Line: 52-57

Comment:
`sanitizeCsvCell` unconditionally prepends `'` to any string starting with `-`, `+`, `=`, or `@`. While this prevents CSV formula injection, it silently corrupts legitimate task data:

- A task titled `"- Remove deprecated API"` becomes `"'- Remove deprecated API"` in the output CSV, with the apostrophe stored as part of the title.
- Numeric or numeric-like strings such as negative numbers (e.g., `"-1"`, `"+42"`) are unnecessarily mangled.

The check should skip sanitization for values that are already valid numbers, since numeric strings cannot contain formula injections:

```suggestion
function sanitizeCsvCell(value: string): string {
	if (/^[=+\-@]/.test(value) && Number.isNaN(Number(value))) {
		return `'${value}`;
	}
	return value;
}
```

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (1)

cli/src/tasks/types.ts
TaskSourceType does not include "csv" but should. CsvTaskSource in csv.ts declares type = "csv" as const, and the factory in tasks/index.ts routes on this value. However, TaskSourceType still includes the obsolete "json" literal and lacks "csv", creating a type mismatch.

Update the type to:

export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "csv" | "github";
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/types.ts
Line: 20

Comment:
`TaskSourceType` does not include `"csv"` but should. `CsvTaskSource` in `csv.ts` declares `type = "csv" as const`, and the factory in `tasks/index.ts` routes on this value. However, `TaskSourceType` still includes the obsolete `"json"` literal and lacks `"csv"`, creating a type mismatch.

Update the type to:
```typescript
export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "csv" | "github";
```

How can I resolve this? If you propose a fix, please make it concise.

* Task source type
*/
export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "json" | "github";
export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "json" | "csv" | "github";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"json" retained in TaskSourceType but has no handler

"json" is still present in the TaskSourceType union, but the createTaskSource switch in cli/src/tasks/index.ts replaced the case "json": with case "csv": and removed the json.ts export entirely. Any existing code or configuration that passes type: "json" to createTaskSource will now hit the default branch and throw Unknown task source type: json at runtime.

If the intent is to fully replace JSON sources with CSV, "json" should be removed from this union (which is a breaking change and warrants a migration note). If json should still be supported alongside csv, the handler and export need to be restored.

Suggested change
export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "json" | "csv" | "github";
export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "csv" | "github";
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/types.ts
Line: 20

Comment:
**`"json"` retained in `TaskSourceType` but has no handler**

`"json"` is still present in the `TaskSourceType` union, but the `createTaskSource` switch in `cli/src/tasks/index.ts` replaced the `case "json":` with `case "csv":` and removed the `json.ts` export entirely. Any existing code or configuration that passes `type: "json"` to `createTaskSource` will now hit the `default` branch and throw `Unknown task source type: json` at runtime.

If the intent is to fully replace JSON sources with CSV, `"json"` should be removed from this union (which is a breaking change and warrants a migration note). If `json` should still be supported alongside `csv`, the handler and export need to be restored.

```suggestion
export type TaskSourceType = "markdown" | "markdown-folder" | "yaml" | "csv" | "github";
```

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (1)

cli/src/cli/commands/convert.ts
Error is logged twice in the conversion failure path

runConvert logs the error with logError(...) and then rethrows it. The caller in main() (cli/src/index.ts) catches the same error and logs it again via logError(standardizeError(error).message). A user running a failed conversion will see two error lines printed for one failure event.

Since the re-throw is sufficient for the caller to handle the error, consider removing the logError here and letting main() be the single error-reporting site:

	} catch (err) {
		throw err;
	}
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/cli/commands/convert.ts
Line: 118-122

Comment:
**Error is logged twice in the conversion failure path**

`runConvert` logs the error with `logError(...)` and then rethrows it. The caller in `main()` (`cli/src/index.ts`) catches the same error and logs it again via `logError(standardizeError(error).message)`. A user running a failed conversion will see two error lines printed for one failure event.

Since the re-throw is sufficient for the caller to handle the error, consider removing the `logError` here and letting `main()` be the single error-reporting site:

```suggestion
	} catch (err) {
		throw err;
	}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 85 to +95
// PRD loop mode
await runLoop(options);
} catch (error) {
logError(error instanceof Error ? error.message : String(error));
process.exitCode = 1;
} finally {
// Ensure all progress writes are flushed before exit
await flushAllProgressWrites();
logError(standardizeError(error).message);
await runCleanup();
process.exit(1);
}
}

main();
// BUG FIX: Await main() to prevent floating promise and ensure proper error handling
await main();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup never runs on successful exit

The old code called flushAllProgressWrites() in a finally block, guaranteeing it ran regardless of whether main() succeeded or threw. The replacement moves runCleanup() exclusively into the catch block, so a normal (successful) run completes without ever calling cleanup.

Any registered cleanup functions — such as the JsonFileLogSink.dispose() timer teardown and final flush — will be skipped on success. Structured log entries buffered in JsonFileLogSink but not yet flushed will be silently lost when the process exits.

Consider restoring a finally block (or an explicit call before return) so cleanup always runs:

try {
    ...
    await runLoop(options);
} catch (error) {
    logError(standardizeError(error).message);
    await runCleanup();
    process.exit(1);
} finally {
    await runCleanup();
}

Or, since process.on("exit") runs synchronously, register the cleanup there — but note that async cleanup cannot be awaited from an exit listener, so the finally approach is safer.

Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/index.ts
Line: 85-95

Comment:
**Cleanup never runs on successful exit**

The old code called `flushAllProgressWrites()` in a `finally` block, guaranteeing it ran regardless of whether `main()` succeeded or threw. The replacement moves `runCleanup()` exclusively into the `catch` block, so a normal (successful) run completes without ever calling cleanup.

Any registered cleanup functions — such as the `JsonFileLogSink.dispose()` timer teardown and final flush — will be skipped on success. Structured log entries buffered in `JsonFileLogSink` but not yet flushed will be silently lost when the process exits.

Consider restoring a `finally` block (or an explicit call before `return`) so cleanup always runs:

```typescript
try {
    ...
    await runLoop(options);
} catch (error) {
    logError(standardizeError(error).message);
    await runCleanup();
    process.exit(1);
} finally {
    await runCleanup();
}
```

Or, since `process.on("exit")` runs synchronously, register the cleanup there — but note that async cleanup cannot be awaited from an `exit` listener, so the `finally` approach is safer.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (5)

cli/src/tasks/converter.ts
allSkillsToCsv breaks multi-line quoted CSV fields

The function splits the inner skillsToCsv output by "\n" before calling parseCSV(line) on each fragment. When a skill description contains an embedded newline — which rowsToCsv correctly wraps in double-quotes, e.g. "line1\nline2" — the manual split("\n") tears the quoted field apart into two malformed fragments. parseCSV then receives "line1 on one iteration and line2" on the next, yielding incorrect and potentially corrupted rows in the merged output.

The fix is to parse the entire inner CSV at once and skip the header row:

export function allSkillsToCsv(skillDirs: string[]): string {
    const allRows: string[][] = [["skill", "desc", "summary"]];

    for (const dir of skillDirs) {
        const csv = skillsToCsv(dir);
        if (csv) {
            const parsedRows = parseCSV(csv);
            for (let i = 1; i < parsedRows.length; i++) {
                allRows.push(parsedRows[i]);
            }
        }
    }

    if (allRows.length <= 1) return "";
    return rowsToCsv(allRows);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/converter.ts
Line: 566-584

Comment:
**`allSkillsToCsv` breaks multi-line quoted CSV fields**

The function splits the inner `skillsToCsv` output by `"\n"` **before** calling `parseCSV(line)` on each fragment. When a skill description contains an embedded newline — which `rowsToCsv` correctly wraps in double-quotes, e.g. `"line1\nline2"` — the manual `split("\n")` tears the quoted field apart into two malformed fragments. `parseCSV` then receives `"line1` on one iteration and `line2"` on the next, yielding incorrect and potentially corrupted rows in the merged output.

The fix is to parse the entire inner CSV at once and skip the header row:

```typescript
export function allSkillsToCsv(skillDirs: string[]): string {
    const allRows: string[][] = [["skill", "desc", "summary"]];

    for (const dir of skillDirs) {
        const csv = skillsToCsv(dir);
        if (csv) {
            const parsedRows = parseCSV(csv);
            for (let i = 1; i < parsedRows.length; i++) {
                allRows.push(parsedRows[i]);
            }
        }
    }

    if (allRows.length <= 1) return "";
    return rowsToCsv(allRows);
}
```

How can I resolve this? If you propose a fix, please make it concise.

cli/src/ui/logger.ts
JsonFileLogSink writes to file without creating the parent directory

The constructor calls validateLogPath(filePath) which resolves the path, then immediately sets up a periodic flush timer. However, appendFileSync (used in flush()) will throw ENOENT if the logs/ directory does not exist — there is no mkdirSync call before the first write attempt.

The periodic flush interval will silently catch-and-log this error repeatedly, eating memory in the error recovery buffer, until the directory is manually created.

Additionally, validateLogPath hard-codes an ALLOWED_LOG_DIR = "logs" restriction relative to process.cwd(). This rejects any absolute path (e.g. /var/log/ralphy.log) and any relative path outside ./logs/, which would break standard production deployment configurations.

Consider creating the directory in the constructor and relaxing the path restriction to prevent path traversal without artificially limiting usable log paths:

constructor(filePath: string, ...) {
    this.filePath = validateLogPath(filePath);
    mkdirSync(path.dirname(this.filePath), { recursive: true });
    ...
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/ui/logger.ts
Line: 1760-1773

Comment:
**`JsonFileLogSink` writes to file without creating the parent directory**

The constructor calls `validateLogPath(filePath)` which resolves the path, then immediately sets up a periodic flush timer. However, `appendFileSync` (used in `flush()`) will throw `ENOENT` if the `logs/` directory does not exist — there is no `mkdirSync` call before the first write attempt.

The periodic flush interval will silently catch-and-log this error repeatedly, eating memory in the error recovery buffer, until the directory is manually created.

Additionally, `validateLogPath` hard-codes an `ALLOWED_LOG_DIR = "logs"` restriction relative to `process.cwd()`. This rejects any absolute path (e.g. `/var/log/ralphy.log`) and any relative path outside `./logs/`, which would break standard production deployment configurations.

Consider creating the directory in the constructor and relaxing the path restriction to prevent path traversal without artificially limiting usable log paths:

```typescript
constructor(filePath: string, ...) {
    this.filePath = validateLogPath(filePath);
    mkdirSync(path.dirname(this.filePath), { recursive: true });
    ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

cli/src/cli/commands/convert.ts
Task count may over-count when CSV fields contain embedded newlines

csvContent.split("\n").length - 1 counts raw newlines in the string, including newlines that occur inside double-quoted CSV fields (e.g. a task description that spans two lines). The rowsToCsv / escapeCsvValue functions correctly quote such values, so the CSV is well-formed — but a naïve split("\n") does not distinguish quoted-embedded newlines from row separators.

For example, a task with desc = "fix login\nand logout" produces the CSV row ..., "fix login\nand logout", which split("\n") counts as two rows.

Use parseCSV to count actual rows instead:

const taskCount = parseCSV(csvContent).length - 1; // exclude header, using proper CSV parser
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/cli/commands/convert.ts
Line: 124

Comment:
**Task count may over-count when CSV fields contain embedded newlines**

`csvContent.split("\n").length - 1` counts raw newlines in the string, including newlines that occur *inside* double-quoted CSV fields (e.g. a task description that spans two lines). The `rowsToCsv` / `escapeCsvValue` functions correctly quote such values, so the CSV is well-formed — but a naïve `split("\n")` does not distinguish quoted-embedded newlines from row separators.

For example, a task with `desc = "fix login\nand logout"` produces the CSV row `..., "fix login\nand logout"`, which `split("\n")` counts as two rows.

Use `parseCSV` to count actual rows instead:

```typescript
const taskCount = parseCSV(csvContent).length - 1; // exclude header, using proper CSV parser
```

How can I resolve this? If you propose a fix, please make it concise.

cli/src/tasks/yaml.ts
BFS queue uses Array.shift() — O(n) per dequeue

queue.shift() removes the first element of the array, which requires shifting every remaining element one position to the left. This makes each dequeue O(n), giving the entire traversal an O(n²) worst-case complexity.

With MAX_NODES = 10_000, that could mean up to ~50 million element shifts for a maximally complex YAML file. Consider using a simple index counter to simulate a queue in O(1):

let head = 0;
while (head < queue.length) {
    const current = queue[head++];
    // ... rest of loop body unchanged
}

This eliminates the O(n) shift() cost while keeping the BFS behaviour identical.

Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/yaml.ts
Line: 1348-1360

Comment:
**BFS queue uses `Array.shift()` — O(n) per dequeue**

`queue.shift()` removes the first element of the array, which requires shifting every remaining element one position to the left. This makes each dequeue O(n), giving the entire traversal an O(n²) worst-case complexity.

With `MAX_NODES = 10_000`, that could mean up to ~50 million element shifts for a maximally complex YAML file. Consider using a simple index counter to simulate a queue in O(1):

```typescript
let head = 0;
while (head < queue.length) {
    const current = queue[head++];
    // ... rest of loop body unchanged
}
```

This eliminates the O(n) `shift()` cost while keeping the BFS behaviour identical.

How can I resolve this? If you propose a fix, please make it concise.

cli/src/ui/logger.ts
Console output is now very verbose for a CLI tool

Every log call now emits a full ISO timestamp and component tag, e.g.:

[2026-03-04T12:00:00.000Z] [SUCCESS] [ralphy] Converted tasks.yaml → tasks.csv
[2026-03-04T12:00:00.000Z] [INFO] [ralphy]   5 tasks

Previously users saw the concise [OK] Converted … format. For an interactive CLI, timestamped structured output is typically reserved for file sinks; the console sink is usually kept minimal (matching the old format).

Consider keeping ConsoleLogSink concise (just a [LEVEL] prefix), and restricting the full structured format to JsonFileLogSink. This prevents a visible UX regression for end-users.

Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/ui/logger.ts
Line: 1589-1621

Comment:
**Console output is now very verbose for a CLI tool**

Every log call now emits a full ISO timestamp and component tag, e.g.:

```
[2026-03-04T12:00:00.000Z] [SUCCESS] [ralphy] Converted tasks.yaml → tasks.csv
[2026-03-04T12:00:00.000Z] [INFO] [ralphy]   5 tasks
```

Previously users saw the concise `[OK] Converted …` format. For an interactive CLI, timestamped structured output is typically reserved for file sinks; the console sink is usually kept minimal (matching the old format).

Consider keeping `ConsoleLogSink` concise (just a `[LEVEL]` prefix), and restricting the full structured format to `JsonFileLogSink`. This prevents a visible UX regression for end-users.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +144 to +182
async markComplete(id: string): Promise<void> {
const content = readFileSync(this.filePath, "utf-8");
const rows = parseCSV(content);
if (rows.length < 2) {
return;
}

const header = rows[0].map((cell) => cell.trim().toLowerCase());
const idIndex = header.indexOf("id");
const titleIndex = header.indexOf("title");
const doneIndex = Math.max(header.indexOf("done"), header.indexOf("completed"));
const resolvedIdIndex = idIndex >= 0 ? idIndex : 0;
const resolvedTitleIndex = titleIndex >= 0 ? titleIndex : 1;
const resolvedDoneIndex = doneIndex >= 0 ? doneIndex : 2;
if (doneIndex < 0) {
logWarn(
`CSV markComplete: no 'done'/'completed' header found for ${this.filePath}, falling back to column index 2.`,
);
}

let updated = false;
for (let i = 1; i < rows.length; i++) {
const row = rows[i];
const rowId = row[resolvedIdIndex] || "";
const rowTitle = row[resolvedTitleIndex] || "";
if (rowId === id || rowTitle === id) {
while (row.length <= resolvedDoneIndex) {
row.push("");
}
row[resolvedDoneIndex] = "1";
updated = true;
break;
}
}

if (updated) {
writeFileSync(this.filePath, rowsToCsv(rows), "utf-8");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markComplete silently no-ops when task ID not found. When no CSV row matches the given id (by ID or title column), updated stays false and the function returns void without signaling any error. The caller in runLoop cannot distinguish "task was successfully marked complete" from "no matching task was found". This diverges from the contract of other TaskSource implementations, which throw on unrecognized IDs.

In practice, a bug passing the wrong ID will silently proceed, leaving the task perpetually in the queue.

Suggested change
async markComplete(id: string): Promise<void> {
const content = readFileSync(this.filePath, "utf-8");
const rows = parseCSV(content);
if (rows.length < 2) {
return;
}
const header = rows[0].map((cell) => cell.trim().toLowerCase());
const idIndex = header.indexOf("id");
const titleIndex = header.indexOf("title");
const doneIndex = Math.max(header.indexOf("done"), header.indexOf("completed"));
const resolvedIdIndex = idIndex >= 0 ? idIndex : 0;
const resolvedTitleIndex = titleIndex >= 0 ? titleIndex : 1;
const resolvedDoneIndex = doneIndex >= 0 ? doneIndex : 2;
if (doneIndex < 0) {
logWarn(
`CSV markComplete: no 'done'/'completed' header found for ${this.filePath}, falling back to column index 2.`,
);
}
let updated = false;
for (let i = 1; i < rows.length; i++) {
const row = rows[i];
const rowId = row[resolvedIdIndex] || "";
const rowTitle = row[resolvedTitleIndex] || "";
if (rowId === id || rowTitle === id) {
while (row.length <= resolvedDoneIndex) {
row.push("");
}
row[resolvedDoneIndex] = "1";
updated = true;
break;
}
}
if (updated) {
writeFileSync(this.filePath, rowsToCsv(rows), "utf-8");
}
}
if (updated) {
writeFileSync(this.filePath, rowsToCsv(rows), "utf-8");
} else {
throw new Error(`Task ID not found in CSV: ${id}`);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/csv.ts
Line: 144-182

Comment:
`markComplete` silently no-ops when task ID not found. When no CSV row matches the given `id` (by ID or title column), `updated` stays `false` and the function returns `void` without signaling any error. The caller in `runLoop` cannot distinguish "task was successfully marked complete" from "no matching task was found". This diverges from the contract of other `TaskSource` implementations, which throw on unrecognized IDs.

In practice, a bug passing the wrong ID will silently proceed, leaving the task perpetually in the queue.

```suggestion
		if (updated) {
			writeFileSync(this.filePath, rowsToCsv(rows), "utf-8");
		} else {
			throw new Error(`Task ID not found in CSV: ${id}`);
		}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +8 to +9
export function parseCSV(content: string): string[][] {
const normalized = content.replace(/\r\n/g, "\n").replace(/\r/g, "\n");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseCSV doesn't strip the UTF-8 Byte Order Mark (BOM), breaking header detection. CSV files from Excel, LibreOffice, or Windows tools commonly begin with \uFEFF. The parser passes this character through unchanged, so the first header cell becomes "\uFEFFid" instead of "id".

Downstream, markComplete calls header.indexOf("id"), which fails to match "\uFEFFid" and falls back to column-index 0 — which may happen to be correct by accident, but the same mismatch means header.indexOf("done") / header.indexOf("completed") could also fail, corrupting the wrong cell.

Strip the BOM at the top of parseCSV:

Suggested change
export function parseCSV(content: string): string[][] {
const normalized = content.replace(/\r\n/g, "\n").replace(/\r/g, "\n");
export function parseCSV(content: string): string[][] {
const normalized = content
.replace(/^\uFEFF/, "")
.replace(/\r\n/g, "\n")
.replace(/\r/g, "\n");
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/csv.ts
Line: 8-9

Comment:
`parseCSV` doesn't strip the UTF-8 Byte Order Mark (BOM), breaking header detection. CSV files from Excel, LibreOffice, or Windows tools commonly begin with `\uFEFF`. The parser passes this character through unchanged, so the first header cell becomes `"\uFEFFid"` instead of `"id"`.

Downstream, `markComplete` calls `header.indexOf("id")`, which fails to match `"\uFEFFid"` and falls back to column-index 0 — which may happen to be correct by accident, but the same mismatch means `header.indexOf("done")` / `header.indexOf("completed")` could also fail, corrupting the wrong cell.

Strip the BOM at the top of `parseCSV`:

```suggestion
export function parseCSV(content: string): string[][] {
	const normalized = content
		.replace(/^\uFEFF/, "")
		.replace(/\r\n/g, "\n")
		.replace(/\r/g, "\n");
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +126 to +157
function parseSkillFile(
filePath: string,
): { name: string; description: string; content: string } | null {
try {
const content = readFileSync(filePath, "utf-8");
const frontmatterMatch = content.match(/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/);

if (frontmatterMatch) {
const frontmatter = YAML.parse(frontmatterMatch[1]) as SkillFrontmatter;
if (hasPrototypePollution(frontmatter)) {
return null;
}
const body = frontmatterMatch[2].trim();

return {
name: frontmatter.name || "",
description: frontmatter.description || "",
content: truncateDesc(body, 200),
};
}

// No frontmatter, use filename as name
const name = filePath.split(/[/\\]/).pop()?.replace(".md", "") || "";
return {
name,
description: "",
content: truncateDesc(content, 200),
};
} catch {
return null;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseSkillFile applies the frontmatter regex to raw file content without normalizing line endings first. The regex /^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/ expects LF (\n) literal line breaks. On Windows or in files with CRLF line endings, the --- separator fails to match, and the function falls through to the no-frontmatter path, using the entire raw file content (truncated to 200 chars) as the content field. This means any YAML-like preamble that fails the regex check ends up visible in the skills CSV.

Normalize line endings before applying the regex:

Suggested change
function parseSkillFile(
filePath: string,
): { name: string; description: string; content: string } | null {
try {
const content = readFileSync(filePath, "utf-8");
const frontmatterMatch = content.match(/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/);
if (frontmatterMatch) {
const frontmatter = YAML.parse(frontmatterMatch[1]) as SkillFrontmatter;
if (hasPrototypePollution(frontmatter)) {
return null;
}
const body = frontmatterMatch[2].trim();
return {
name: frontmatter.name || "",
description: frontmatter.description || "",
content: truncateDesc(body, 200),
};
}
// No frontmatter, use filename as name
const name = filePath.split(/[/\\]/).pop()?.replace(".md", "") || "";
return {
name,
description: "",
content: truncateDesc(content, 200),
};
} catch {
return null;
}
}
const content = readFileSync(filePath, "utf-8");
const normalized = content.replace(/\r\n/g, "\n");
const frontmatterMatch = normalized.match(/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/);
Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/src/tasks/converter.ts
Line: 126-157

Comment:
`parseSkillFile` applies the frontmatter regex to raw file content without normalizing line endings first. The regex `/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/` expects LF (`\n`) literal line breaks. On Windows or in files with CRLF line endings, the `---` separator fails to match, and the function falls through to the no-frontmatter path, using the entire raw file content (truncated to 200 chars) as the `content` field. This means any YAML-like preamble that fails the regex check ends up visible in the skills CSV.

Normalize line endings before applying the regex:

```suggestion
const content = readFileSync(filePath, "utf-8");
const normalized = content.replace(/\r\n/g, "\n");
const frontmatterMatch = normalized.match(/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/);
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant