Skip to content

fix: grep -c pattern produces "0\n0" when no matches found#251

Open
visigoth wants to merge 1 commit intofrankbria:mainfrom
visigoth:bug/completion-pattern-fixes
Open

fix: grep -c pattern produces "0\n0" when no matches found#251
visigoth wants to merge 1 commit intofrankbria:mainfrom
visigoth:bug/completion-pattern-fixes

Conversation

@visigoth
Copy link
Copy Markdown
Contributor

@visigoth visigoth commented Apr 11, 2026

Summary

grep -c writes 0 to stdout even when it exits non-zero (no matches). The pattern $(grep -c ... || echo "0") therefore concatenates both outputs into "0\n0", which breaks any arithmetic or comparison that consumes the result.

This PR replaces that pattern across the codebase with the safe form:

var=$(grep -c ...) || var=0

The fix is applied in:

  • ralph_loop.sh
  • lib/response_analyzer.sh
  • create_files.sh
  • ralph_enable_ci.sh

Why it matters

Anywhere a grep -c count flowed into [[ $count -ge N ]] or arithmetic could silently misbehave when there were zero matches — "0\n0" is not a valid integer and triggers shell errors or wrong branches in exit detection, circuit breaker logic, and response analysis.

Test plan

  • npm test passes locally
  • Manually exercise a Ralph loop iteration where grep targets produce zero matches and confirm no 0\n0 integer-expression errors appear in the logs
  • Confirm exit detection / circuit breaker counters still increment correctly when matches do exist

Summary by CodeRabbit

  • New Features

    • Added graceful stop support with a --stop CLI flag for controlled application shutdown.
  • Bug Fixes

    • Ensured counters and status checks reliably default to zero when no matches are found, preventing spurious failures.
    • Improved handling of no-match scenarios for several monitoring and analysis flows.
  • Improvements

    • Made background sleeps and monitoring operations interruptible for faster responsiveness to interrupts.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b8bd5e1-10fa-494c-a6ad-3f7b133d0a0b

📥 Commits

Reviewing files that changed from the base of the PR and between bf4797a and e0bdff1.

📒 Files selected for processing (4)
  • create_files.sh
  • lib/response_analyzer.sh
  • ralph_enable_ci.sh
  • ralph_loop.sh
✅ Files skipped from review due to trivial changes (1)
  • lib/response_analyzer.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • ralph_enable_ci.sh
  • create_files.sh

Walkthrough

Replaced several instances of grep-based count fallbacks that emitted "0" (using || echo "0") with explicit variable defaulting (|| var=0) across multiple shell scripts; no functional control-flow changes or exported symbol modifications were introduced.

Changes

Cohort / File(s) Summary
Grep-fallback fixes — create_files & loop
create_files.sh, ralph_loop.sh
Replaced occurrences of `...
Grep-fallback fixes — enable CI
ralph_enable_ci.sh
Same pattern: replaced `
Grep-fallback fixes — response analyzer
lib/response_analyzer.sh
Removed `

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐰
I nudged the greps to set defaults true,
No stdout zeros hopping through,
Variables tidy, counts behaving,
Quiet shells now, no echo misbehaving —
A small carrot-cheer for code anew.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main bug being fixed: the grep -c pattern producing "0\n0" when no matches are found, which is the core issue addressed across all changed files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ralph_loop.sh (1)

1572-1617: ⚠️ Potential issue | 🔴 Critical

Remove duplicate pipeline execution and enable pipefail in the live-mode subshell.

Lines 1572–1574 run a foreground pipeline, and then lines 1579–1588 run the same pipeline again in a backgrounded subshell. This causes two Claude invocations per loop iteration. Additionally, the subshell lacks set -o pipefail, so if portable_timeout times out (exit code 124) or jq fails, the final tee command's exit code (0) masks those failures, incorrectly reporting success to the error-handling logic below (e.g., lines 1596, 1603).

Suggested fix
-        portable_timeout ${timeout_seconds}s stdbuf -oL "${LIVE_CMD_ARGS[@]}" \
-            < /dev/null 2>"$stderr_file" | stdbuf -oL tee "$output_file" | stdbuf -oL jq --unbuffered -j "$jq_filter" 2>/dev/null | tee "$LIVE_LOG_FILE"
-
         # Run pipeline in a subshell so ^C can interrupt via our trap handler.
         # Without this, bash waits for the full pipeline to finish even after SIGINT.
         # The subshell runs in its own process group, so kill_child_processes() can
         # terminate the entire pipeline tree (claude + tee + jq + tee).
         (
+            set -o pipefail
             portable_timeout ${timeout_seconds}s stdbuf -oL "${LIVE_CMD_ARGS[@]}" \
                 < /dev/null 2>"$stderr_file" | stdbuf -oL tee "$output_file" | stdbuf -oL jq --unbuffered -j "$jq_filter" 2>/dev/null | tee "$LIVE_LOG_FILE"
         ) &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1572 - 1617, There are two problems: the pipeline
is executed twice (first as a foreground pipeline using portable_timeout
${timeout_seconds}s stdbuf ... | jq ... | tee "$LIVE_LOG_FILE", then again
inside the backgrounded subshell) and the subshell lacks pipefail so failures
from portable_timeout or jq can be masked by the final tee; fix by removing the
initial standalone pipeline invocation (the one that pipes directly to jq and
tee before the subshell) so only the subshell run using portable_timeout
"${LIVE_CMD_ARGS[@]}" executes, and add set -o pipefail (or set -euo pipefail if
preferred) at the start of the subshell body that runs
portable_timeout/tee/jq/tee so the subshell returns a failure when any pipeline
stage fails and the existing wait $pipeline_pid / exit_code logic (and
subsequent checks against stderr_file, output_file, and timeout handling) will
see the correct exit status.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ralph_loop.sh`:
- Around line 2555-2559: In the --stop case (where touch "$RALPH_DIR/.stop" is
used) ensure the script verifies that the stop file was actually created and
fails when it cannot be written: attempt to create the file, check the command's
exit status (or test -w/createability of $RALPH_DIR), and if creation fails
print an error to stderr and exit non‑zero (e.g., exit 1); only print the
success message and exit 0 when touch succeeds. Reference the --stop branch and
the touch "$RALPH_DIR/.stop" operation when making the change.
- Around line 2044-2060: The kill_child_processes function uses a process-group
kill that can accidentally signal unrelated shells; change it to directly
terminate the known child PID captured as pipeline_pid (used when launching the
live-mode subshell) instead of using kill -- -"$my_pid". Specifically, after
pkill -P "$my_pid" and the pkill -KILL -P "$my_pid" steps, check that
pipeline_pid is non-empty and a different PID than $$, then send TERM (and if
needed KILL) to "$pipeline_pid" with the same 2>/dev/null || true safety; remove
the final kill -- -"$my_pid" invocation. Ensure all references are to the
kill_child_processes function and the pipeline_pid variable.

---

Outside diff comments:
In `@ralph_loop.sh`:
- Around line 1572-1617: There are two problems: the pipeline is executed twice
(first as a foreground pipeline using portable_timeout ${timeout_seconds}s
stdbuf ... | jq ... | tee "$LIVE_LOG_FILE", then again inside the backgrounded
subshell) and the subshell lacks pipefail so failures from portable_timeout or
jq can be masked by the final tee; fix by removing the initial standalone
pipeline invocation (the one that pipes directly to jq and tee before the
subshell) so only the subshell run using portable_timeout "${LIVE_CMD_ARGS[@]}"
executes, and add set -o pipefail (or set -euo pipefail if preferred) at the
start of the subshell body that runs portable_timeout/tee/jq/tee so the subshell
returns a failure when any pipeline stage fails and the existing wait
$pipeline_pid / exit_code logic (and subsequent checks against stderr_file,
output_file, and timeout handling) will see the correct exit status.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 488af2f0-6568-43e7-aef9-cfa07fc9e5f5

📥 Commits

Reviewing files that changed from the base of the PR and between 2265569 and bf4797a.

📒 Files selected for processing (4)
  • create_files.sh
  • lib/response_analyzer.sh
  • ralph_enable_ci.sh
  • ralph_loop.sh

Comment thread ralph_loop.sh Outdated
Comment on lines +2044 to +2060
# Kill all child processes recursively
# Uses pkill -P to walk the process tree from our PID downward
kill_child_processes() {
local my_pid=$$

# First try SIGTERM for graceful shutdown
pkill -TERM -P "$my_pid" 2>/dev/null || true

# Brief grace period for processes to exit
sleep 0.5

# Then SIGKILL anything still alive
pkill -KILL -P "$my_pid" 2>/dev/null || true

# Also kill any processes in our process group that aren't us
# This catches grandchildren that were reparented
kill -- -"$my_pid" 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check the function context and how it's used
cat ralph_loop.sh | head -2100 | tail -100

Repository: frankbria/ralph-claude-code

Length of output: 3874


🏁 Script executed:

# Find how Claude is spawned and if PIDs are tracked
rg -n "claude|spawn|pid|PGID" ralph_loop.sh | head -50

Repository: frankbria/ralph-claude-code

Length of output: 3481


🏁 Script executed:

# Check if child PID tracking exists anywhere in the script
rg -n "child.*pid|spawn.*pid" ralph_loop.sh -i

Repository: frankbria/ralph-claude-code

Length of output: 53


🏁 Script executed:

# Examine the actual kill_child_processes context - when it's called
rg -n "kill_child_processes" ralph_loop.sh -B 5 -A 2

Repository: frankbria/ralph-claude-code

Length of output: 1675


🏁 Script executed:

# Verify the process group behavior - what does $$ actually represent
rg -n "PGID|process.group|setpgrp|setsid" ralph_loop.sh -i

Repository: frankbria/ralph-claude-code

Length of output: 287


Use direct child process termination instead of process group signaling.

The kill -- -"$my_pid" command is unsafe. It treats $$ (the script's PID) as a process group ID, but these are different. If ralph_loop.sh is backgrounded or run in certain contexts, the PGID may differ from the PID, causing this command to signal the parent shell or unrelated sibling processes.

The code already captures pipeline_pid (line 1583) when spawning the live mode subshell and uses it with wait. Instead of relying on pkill -P plus the dangerous group signal, directly terminate the captured child PID and let pkill -P handle any subprocesses spawned by it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 2044 - 2060, The kill_child_processes function
uses a process-group kill that can accidentally signal unrelated shells; change
it to directly terminate the known child PID captured as pipeline_pid (used when
launching the live-mode subshell) instead of using kill -- -"$my_pid".
Specifically, after pkill -P "$my_pid" and the pkill -KILL -P "$my_pid" steps,
check that pipeline_pid is non-empty and a different PID than $$, then send TERM
(and if needed KILL) to "$pipeline_pid" with the same 2>/dev/null || true
safety; remove the final kill -- -"$my_pid" invocation. Ensure all references
are to the kill_child_processes function and the pipeline_pid variable.

Comment thread ralph_loop.sh Outdated
Comment on lines +2555 to +2559
--stop)
# Create stop file to gracefully stop the loop after the current iteration
touch "$RALPH_DIR/.stop"
echo -e "\033[0;33m⏹ Stop requested — Ralph will exit after the current loop completes\033[0m"
exit 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail --stop when the stop file cannot be created.

Right now this prints success and exits 0 even outside a Ralph project or when .ralph/ is not writable.

Suggested fix
         --stop)
             # Create stop file to gracefully stop the loop after the current iteration
-            touch "$RALPH_DIR/.stop"
+            if [[ ! -d "$RALPH_DIR" ]]; then
+                echo "Error: '$RALPH_DIR' not found. Run this from a Ralph project." >&2
+                exit 1
+            fi
+            if ! touch "$RALPH_DIR/.stop"; then
+                echo "Error: failed to create '$RALPH_DIR/.stop'" >&2
+                exit 1
+            fi
             echo -e "\033[0;33m⏹ Stop requested — Ralph will exit after the current loop completes\033[0m"
             exit 0
             ;;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 2555 - 2559, In the --stop case (where touch
"$RALPH_DIR/.stop" is used) ensure the script verifies that the stop file was
actually created and fails when it cannot be written: attempt to create the
file, check the command's exit status (or test -w/createability of $RALPH_DIR),
and if creation fails print an error to stderr and exit non‑zero (e.g., exit 1);
only print the success message and exit 0 when touch succeeds. Reference the
--stop branch and the touch "$RALPH_DIR/.stop" operation when making the change.

@visigoth
Copy link
Copy Markdown
Contributor Author

this PR consists of a single commit that was extracted from a branch with other work, so the commit itself has changes beyond what is described. i will rework it to be more scoped.

grep -c outputs "0" to stdout even when exiting with code 1 (no matches).
The old pattern $(grep -c ... || echo "0") concatenated both outputs,
producing "0\n0" which breaks arithmetic expressions.

Fixed across all files: ralph_loop.sh, create_files.sh, response_analyzer.sh,
ralph_enable_ci.sh. Uses var=$(grep -c ...) || var=0 pattern instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@visigoth visigoth force-pushed the bug/completion-pattern-fixes branch from bf4797a to e0bdff1 Compare April 12, 2026 01:08
@visigoth
Copy link
Copy Markdown
Contributor Author

the commit has been constrained down to the PR's actual purpose. the other fixes will come from future PRs.

Copy link
Copy Markdown
Owner

@frankbria frankbria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the grep -c fix — that's a real bug and the fix pattern is correct. Before this can merge there are a few issues to resolve:

Blocking

Double Claude invocation in live mode (critical)
CodeRabbit flagged that lines ~1572–1574 run the streaming pipeline as a foreground command, and then lines ~1579–1588 run the same pipeline again inside a backgrounded subshell. This means two Claude processes fire per loop iteration. The fix is to remove the foreground invocation and keep only the subshell, adding set -o pipefail inside the subshell so exit codes from portable_timeout and jq aren't masked by tee:

-        portable_timeout ${timeout_seconds}s stdbuf -oL "${LIVE_CMD_ARGS[@]}" \
-            < /dev/null 2>"$stderr_file" | stdbuf -oL tee "$output_file" | stdbuf -oL jq --unbuffered -j "$jq_filter" 2>/dev/null | tee "$LIVE_LOG_FILE"
-
         (
+            set -o pipefail
             portable_timeout ${timeout_seconds}s stdbuf -oL "${LIVE_CMD_ARGS[@]}" \
                 < /dev/null 2>"$stderr_file" | stdbuf -oL tee "$output_file" | stdbuf -oL jq --unbuffered -j "$jq_filter" 2>/dev/null | tee "$LIVE_LOG_FILE"
         ) &

--stop flag missing error check
If $RALPH_DIR doesn't exist or isn't writable, touch "$RALPH_DIR/.stop" silently fails but the script exits 0 with a success message. Add a check:

if ! touch "$RALPH_DIR/.stop" 2>/dev/null; then
    echo "Error: could not write stop file to $RALPH_DIR" >&2
    exit 1
fi

kill_child_processes process-group kill
kill -- -"$my_pid" can signal unrelated shells that share the same process group. Prefer targeting $pipeline_pid directly (TERM then KILL) instead of the group kill.

Also: please rebase

PR #256 (a 1-line crash fix in ralph_loop.sh) is being merged first. Please rebase this branch onto main after that lands to avoid a messy merge.

Once the blocking items are fixed and the rebase is done, happy to re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants