ci: add workflow to strip template boilerplate from PR bodies#130

Open

binaryaaron wants to merge 1 commit intomainfrom

binaryaaron/clean-pr-body-comments

Collaborator

binaryaaron commented Feb 26, 2026 •

edited

Loading

Summary

The PR template uses HTML comments and checklist sections for author guidance. GitHub's squash-merge copies the full PR body into the commit message, polluting git history with template boilerplate.

This workflow fires once when a PR is opened, extracts only the Summary section content (plus git trailers), and writes the cleaned body back via the GitHub API. The template stays intact in the editor when creating the PR; after submission, the body shows only what matters for the commit message.

The extraction regex was tested against all 11 affected commits in the repo history. It handles: multi-line HTML comments, inline  in code spans (preserved), # Summary and ## Summary headings, trailer deduplication, and GitHub separator lines.

Test plan

Open a test PR using the template and verify the workflow trims the body to Summary + trailers
Verify inline  in code spans is not stripped
Verify a PR without a Summary heading still gets a reasonable body

binaryaaron force-pushed the binaryaaron/clean-pr-body-comments branch 2 times, most recently from c5a9a0d to 4483b38 Compare

February 26, 2026 17:49


          ci: add workflow to strip template boilerplate from PR bodies

b60cc88

The PR template uses HTML comments and checklist sections for author
guidance. GitHub's squash-merge copies the full PR body into the commit
message, polluting git history with template boilerplate. This workflow
fires once when a PR is opened, extracts only the Summary section
content (plus git trailers), and writes the cleaned body back. The
template stays intact in the editor when creating the PR; after
submission, the body shows only what matters for the commit message.

Tested the extraction regex against all 11 affected commits in the
repo's history. Handles: multi-line HTML comments, inline `<!-- -->`
in code spans (preserved), `# Summary` and `## Summary` headings,
trailer deduplication, and GitHub separator lines.

Signed-off-by: aagonzales <aagonzales@nvidia.com>
Made-with: Cursor

binaryaaron force-pushed the binaryaaron/clean-pr-body-comments branch from 4483b38 to b60cc88 Compare

February 26, 2026 18:36

binaryaaron changed the title ~~ci: add workflow to strip HTML comments from PR bodies~~ ci: add workflow to strip template boilerplate from PR bodies

binaryaaron marked this pull request as ready for review

February 27, 2026 00:46

binaryaaron requested a review from a team as a code owner

February 27, 2026 00:46

Copilot AI review requested due to automatic review settings

February 27, 2026 00:46

Copilot started reviewing on behalf of binaryaaron

February 27, 2026 00:46

binaryaaron requested a review from mckornfield

February 27, 2026 00:46

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR adds a GitHub Actions workflow that automatically cleans PR bodies when PRs are opened. The workflow strips HTML comments and template boilerplate sections, keeping only the Summary section content plus git trailers (Signed-off-by, Co-authored-by). This prevents GitHub's squash-merge from polluting commit messages with template markup.

Changes:

Adds .github/workflows/clean-pr-body.yml workflow that triggers on PR open events
Implements JavaScript-based text processing to extract Summary sections and git trailers
Updates PR bodies via GitHub API to remove template boilerplate

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/clean-pr-body.yml

Comment on lines +1 to +2

		# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
		# SPDX-License-Identifier: Apache-2.0

Copilot AI Feb 27, 2026

The copyright header format differs from other workflow files in this repository. Other workflow files use the standard Apache 2.0 header with # comment style, not the SPDX-FileCopyrightText format. For consistency, this should follow the same format as files like ci-checks.yml, copyright-check.yml, and conventional-commit.yml.

Suggested change

      
            # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
          
            # SPDX-License-Identifier: Apache-2.0
          
            # Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
          
            #
          
            # Licensed under the Apache License, Version 2.0 (the "License");
          
            # you may not use this file except in compliance with the License.
          
            # You may obtain a copy of the License at
          
            #
          
            #     http://www.apache.org/licenses/LICENSE-2.0
          
            #
          
            # Unless required by applicable law or agreed to in writing, software
          
            # distributed under the License is distributed on an "AS IS" BASIS,
          
            # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          
            # See the License for the specific language governing permissions and
          
            # limitations under the License.

Copilot uses AI. Check for mistakes.

.github/workflows/clean-pr-body.yml

+                          let cleaned = summary;
+                          if (trailers.length) cleaned += '\n\n' + trailers.join('\n');
+                          if (cleaned !== body.trim()) {

Copilot AI Feb 27, 2026

The comparison cleaned !== body.trim() may cause unnecessary API calls when the body has no trailing whitespace but the workflow still reformats it. The issue is that cleaned is built from summary.trim() which removes trailing whitespace, but this is compared against body.trim(). If the original body has leading/trailing whitespace differences or internal whitespace variations (e.g., multiple newlines collapsed to double), this comparison will always be false even when semantically equivalent. Consider comparing both after normalizing whitespace, or compare against the original body without trimming.

Suggested change

      
                        if (cleaned !== body.trim()) {
          
                        if (cleaned.replace(/\s+$/, '') !== body.replace(/\s+$/, '')) {

Copilot uses AI. Check for mistakes.

.github/workflows/clean-pr-body.yml

+                          }
+                          // 3. Collect unique git trailers from the full body.
+                          const trailerRe = /^(Signed-off-by|Co-authored-by):\s+.+$/gm;

Copilot AI Feb 27, 2026

The trailer regex on line 48 only matches Signed-off-by and Co-authored-by trailers, but Git and GitHub support additional trailers like Reviewed-by, Tested-by, Acked-by, and others. Consider whether these should also be preserved. If they should be excluded intentionally, the current implementation is correct, but if they should be preserved, the regex should be expanded to match all common Git trailers.

Copilot uses AI. Check for mistakes.

.github/workflows/clean-pr-body.yml

Comment on lines +70 to +76

+                            await github.rest.pulls.update({
+                              owner,
+                              repo,
+                              pull_number: pr_number,
+                              body: cleaned,
+                            });
+                            core.info(`Cleaned PR #${pr_number} body to Summary + trailers`);

Copilot AI Feb 27, 2026

The API call to update the PR body (line 70) has no error handling. If the API call fails (e.g., due to rate limiting, network issues, or permission problems), the workflow will fail silently or with an unclear error. Consider wrapping this in a try-catch block and providing informative error messages, or at least logging when the update succeeds vs. fails.

Suggested change

      
                          await github.rest.pulls.update({
          
                            owner,
          
                            repo,
          
                            pull_number: pr_number,
          
                            body: cleaned,
          
                          });
          
                          core.info(`Cleaned PR #${pr_number} body to Summary + trailers`);
          
                          try {
          
                            await github.rest.pulls.update({
          
                              owner,
          
                              repo,
          
                              pull_number: pr_number,
          
                              body: cleaned,
          
                            });
          
                            core.info(`Cleaned PR #${pr_number} body to Summary + trailers`);
          
                          } catch (error) {
          
                            const message = error && error.message ? error.message : String(error);
          
                            core.error(`Failed to clean PR #${pr_number} body: ${message}`);
          
                            core.setFailed(`Failed to update PR body for #${pr_number}`);
          
                          }

Copilot uses AI. Check for mistakes.

.github/workflows/clean-pr-body.yml

+                          const trailerRe = /^(Signed-off-by|Co-authored-by):\s+.+$/gm;
+                          const seen = new Set();
+                          const trailers = [];
+                          for (const [line] of stripped.matchAll(trailerRe)) {

Copilot AI Feb 27, 2026

The code destructures only the first element from the match array on line 51: for (const [line] of stripped.matchAll(trailerRe)). However, matchAll returns match objects where the full match is at index 0, and capture groups follow. Since the regex /^(Signed-off-by|Co-authored-by):\s+.+$/gm has one capture group, match[0] is the full match (the entire trailer line), and match[1] is the captured trailer type. Using [line] gets the full match, which is correct. However, it would be clearer to use [fullLine] or match[0] to make the intent more obvious.

Copilot uses AI. Check for mistakes.

.github/workflows/clean-pr-body.yml

+                          // 4. Remove trailers and GitHub separator lines from summary text.
+                          summary = summary
+                            .split('\n')
+                            .filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && l.trim() !== '---------')

Copilot AI Feb 27, 2026

The separator line filter on line 61 only checks for lines that are exactly --------- after trimming. However, GitHub and Markdown may use different numbers of dashes for horizontal rules (typically 3 or more dashes: ---, ----, etc.). This overly specific check may not catch all separator variations. Consider using a regex like /^-{3,}$/ to match any line with 3 or more dashes, or clarify why only 9 dashes specifically should be removed.

Suggested change

      
                          .filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && l.trim() !== '---------')
          
                          .filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && !l.trim().match(/^-{3,}$/))

Copilot uses AI. Check for mistakes.

.github/workflows/clean-pr-body.yml

+              name: Clean PR Body
+              on:
+                pull_request:

Copilot AI Feb 27, 2026

The workflow triggers on pull_request with type opened and requires pull-requests: write permission. For PRs from forks, the workflow will run with the fork's GITHUB_TOKEN, which typically does not have write permissions to the base repository. This means the workflow will fail when trying to update the PR body for fork-based PRs. Consider using pull_request_target instead, which runs in the context of the base repository and has write permissions. However, be cautious with pull_request_target as it poses security risks if user-controlled code is executed. In this case, since the workflow only reads the PR body and updates it via the GitHub API (no code checkout or execution), pull_request_target should be safe.

Suggested change

      
              pull_request:
          
              pull_request_target:

Copilot uses AI. Check for mistakes.

.github/workflows/clean-pr-body.yml

Comment on lines +26 to +27

		// preserving inline references like `<!-- -->` in code spans.
		let stripped = body.replace(/^[ \t]<!--[\s\S]?-->[ \t]*\n?/gm, '');

Copilot AI Feb 27, 2026

The HTML comment stripping regex /^[ \t]*[ \t]*\n?/gm has an issue: it will strip comments that appear at the start of a line within code blocks. For example, in a Markdown code block showing HTML/XML examples, a line starting with <!-- would be removed even though it's legitimate content, not template boilerplate. The regex should avoid matching inside code fences (delimited by triple backticks). Consider processing the body to skip code blocks, or use a more sophisticated approach that preserves code block contents.

Suggested change

      
                        //    preserving inline references like `<!-- -->` in code spans.
          
                        let stripped = body.replace(/^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm, '');
          
                        //    preserving inline references like `<!-- -->` in code spans,
          
                        //    and avoiding matches inside fenced code blocks (``` ... ```).
          
                        const commentRe = /^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm;
          
                        const segments = body.split(/(```[\s\S]*?```)/);
          
                        let stripped = segments
          
                          .map((segment, idx) => {
          
                            // Captured fenced code blocks appear at odd indices and start with ```
          
                            if (idx % 2 === 1 && segment.startsWith('```')) {
          
                              return segment;
          
                            }
          
                            return segment.replace(commentRe, '');
          
                          })
          
                          .join('');

Copilot uses AI. Check for mistakes.

mckornfield approved these changes

View reviewed changes

Collaborator

mckornfield left a comment

nice, fancy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet