Skip to content

ci: add workflow to strip template boilerplate from PR bodies#130

Open
binaryaaron wants to merge 1 commit intomainfrom
binaryaaron/clean-pr-body-comments
Open

ci: add workflow to strip template boilerplate from PR bodies#130
binaryaaron wants to merge 1 commit intomainfrom
binaryaaron/clean-pr-body-comments

Conversation

@binaryaaron
Copy link
Copy Markdown
Collaborator

@binaryaaron binaryaaron commented Feb 26, 2026

Summary

The PR template uses HTML comments and checklist sections for author guidance. GitHub's squash-merge copies the full PR body into the commit message, polluting git history with template boilerplate.

This workflow fires once when a PR is opened, extracts only the Summary section content (plus git trailers), and writes the cleaned body back via the GitHub API. The template stays intact in the editor when creating the PR; after submission, the body shows only what matters for the commit message.

The extraction regex was tested against all 11 affected commits in the repo history. It handles: multi-line HTML comments, inline <!-- --> in code spans (preserved), # Summary and ## Summary headings, trailer deduplication, and GitHub separator lines.

Test plan

  • Open a test PR using the template and verify the workflow trims the body to Summary + trailers
  • Verify inline <!-- --> in code spans is not stripped
  • Verify a PR without a Summary heading still gets a reasonable body

@binaryaaron binaryaaron force-pushed the binaryaaron/clean-pr-body-comments branch 2 times, most recently from c5a9a0d to 4483b38 Compare February 26, 2026 17:49
The PR template uses HTML comments and checklist sections for author
guidance. GitHub's squash-merge copies the full PR body into the commit
message, polluting git history with template boilerplate. This workflow
fires once when a PR is opened, extracts only the Summary section
content (plus git trailers), and writes the cleaned body back. The
template stays intact in the editor when creating the PR; after
submission, the body shows only what matters for the commit message.

Tested the extraction regex against all 11 affected commits in the
repo's history. Handles: multi-line HTML comments, inline `<!-- -->`
in code spans (preserved), `# Summary` and `## Summary` headings,
trailer deduplication, and GitHub separator lines.

Signed-off-by: aagonzales <aagonzales@nvidia.com>
Made-with: Cursor
@binaryaaron binaryaaron force-pushed the binaryaaron/clean-pr-body-comments branch from 4483b38 to b60cc88 Compare February 26, 2026 18:36
@binaryaaron binaryaaron changed the title ci: add workflow to strip HTML comments from PR bodies ci: add workflow to strip template boilerplate from PR bodies Feb 26, 2026
@binaryaaron binaryaaron marked this pull request as ready for review February 27, 2026 00:46
@binaryaaron binaryaaron requested a review from a team as a code owner February 27, 2026 00:46
Copilot AI review requested due to automatic review settings February 27, 2026 00:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a GitHub Actions workflow that automatically cleans PR bodies when PRs are opened. The workflow strips HTML comments and template boilerplate sections, keeping only the Summary section content plus git trailers (Signed-off-by, Co-authored-by). This prevents GitHub's squash-merge from polluting commit messages with template markup.

Changes:

  • Adds .github/workflows/clean-pr-body.yml workflow that triggers on PR open events
  • Implements JavaScript-based text processing to extract Summary sections and git trailers
  • Updates PR bodies via GitHub API to remove template boilerplate

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +2
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright header format differs from other workflow files in this repository. Other workflow files use the standard Apache 2.0 header with # comment style, not the SPDX-FileCopyrightText format. For consistency, this should follow the same format as files like ci-checks.yml, copyright-check.yml, and conventional-commit.yml.

Suggested change
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copilot uses AI. Check for mistakes.
let cleaned = summary;
if (trailers.length) cleaned += '\n\n' + trailers.join('\n');

if (cleaned !== body.trim()) {
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison cleaned !== body.trim() may cause unnecessary API calls when the body has no trailing whitespace but the workflow still reformats it. The issue is that cleaned is built from summary.trim() which removes trailing whitespace, but this is compared against body.trim(). If the original body has leading/trailing whitespace differences or internal whitespace variations (e.g., multiple newlines collapsed to double), this comparison will always be false even when semantically equivalent. Consider comparing both after normalizing whitespace, or compare against the original body without trimming.

Suggested change
if (cleaned !== body.trim()) {
if (cleaned.replace(/\s+$/, '') !== body.replace(/\s+$/, '')) {

Copilot uses AI. Check for mistakes.
}

// 3. Collect unique git trailers from the full body.
const trailerRe = /^(Signed-off-by|Co-authored-by):\s+.+$/gm;
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailer regex on line 48 only matches Signed-off-by and Co-authored-by trailers, but Git and GitHub support additional trailers like Reviewed-by, Tested-by, Acked-by, and others. Consider whether these should also be preserved. If they should be excluded intentionally, the current implementation is correct, but if they should be preserved, the regex should be expanded to match all common Git trailers.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +76
await github.rest.pulls.update({
owner,
repo,
pull_number: pr_number,
body: cleaned,
});
core.info(`Cleaned PR #${pr_number} body to Summary + trailers`);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API call to update the PR body (line 70) has no error handling. If the API call fails (e.g., due to rate limiting, network issues, or permission problems), the workflow will fail silently or with an unclear error. Consider wrapping this in a try-catch block and providing informative error messages, or at least logging when the update succeeds vs. fails.

Suggested change
await github.rest.pulls.update({
owner,
repo,
pull_number: pr_number,
body: cleaned,
});
core.info(`Cleaned PR #${pr_number} body to Summary + trailers`);
try {
await github.rest.pulls.update({
owner,
repo,
pull_number: pr_number,
body: cleaned,
});
core.info(`Cleaned PR #${pr_number} body to Summary + trailers`);
} catch (error) {
const message = error && error.message ? error.message : String(error);
core.error(`Failed to clean PR #${pr_number} body: ${message}`);
core.setFailed(`Failed to update PR body for #${pr_number}`);
}

Copilot uses AI. Check for mistakes.
const trailerRe = /^(Signed-off-by|Co-authored-by):\s+.+$/gm;
const seen = new Set();
const trailers = [];
for (const [line] of stripped.matchAll(trailerRe)) {
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code destructures only the first element from the match array on line 51: for (const [line] of stripped.matchAll(trailerRe)). However, matchAll returns match objects where the full match is at index 0, and capture groups follow. Since the regex /^(Signed-off-by|Co-authored-by):\s+.+$/gm has one capture group, match[0] is the full match (the entire trailer line), and match[1] is the captured trailer type. Using [line] gets the full match, which is correct. However, it would be clearer to use [fullLine] or match[0] to make the intent more obvious.

Copilot uses AI. Check for mistakes.
// 4. Remove trailers and GitHub separator lines from summary text.
summary = summary
.split('\n')
.filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && l.trim() !== '---------')
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separator line filter on line 61 only checks for lines that are exactly --------- after trimming. However, GitHub and Markdown may use different numbers of dashes for horizontal rules (typically 3 or more dashes: ---, ----, etc.). This overly specific check may not catch all separator variations. Consider using a regex like /^-{3,}$/ to match any line with 3 or more dashes, or clarify why only 9 dashes specifically should be removed.

Suggested change
.filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && l.trim() !== '---------')
.filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && !l.trim().match(/^-{3,}$/))

Copilot uses AI. Check for mistakes.
name: Clean PR Body

on:
pull_request:
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow triggers on pull_request with type opened and requires pull-requests: write permission. For PRs from forks, the workflow will run with the fork's GITHUB_TOKEN, which typically does not have write permissions to the base repository. This means the workflow will fail when trying to update the PR body for fork-based PRs. Consider using pull_request_target instead, which runs in the context of the base repository and has write permissions. However, be cautious with pull_request_target as it poses security risks if user-controlled code is executed. In this case, since the workflow only reads the PR body and updates it via the GitHub API (no code checkout or execution), pull_request_target should be safe.

Suggested change
pull_request:
pull_request_target:

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +27
// preserving inline references like `<!-- -->` in code spans.
let stripped = body.replace(/^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm, '');
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML comment stripping regex /^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm has an issue: it will strip comments that appear at the start of a line within code blocks. For example, in a Markdown code block showing HTML/XML examples, a line starting with <!-- would be removed even though it's legitimate content, not template boilerplate. The regex should avoid matching inside code fences (delimited by triple backticks). Consider processing the body to skip code blocks, or use a more sophisticated approach that preserves code block contents.

Suggested change
// preserving inline references like `<!-- -->` in code spans.
let stripped = body.replace(/^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm, '');
// preserving inline references like `<!-- -->` in code spans,
// and avoiding matches inside fenced code blocks (``` ... ```).
const commentRe = /^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm;
const segments = body.split(/(```[\s\S]*?```)/);
let stripped = segments
.map((segment, idx) => {
// Captured fenced code blocks appear at odd indices and start with ```
if (idx % 2 === 1 && segment.startsWith('```')) {
return segment;
}
return segment.replace(commentRe, '');
})
.join('');

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@mckornfield mckornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, fancy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants