ci: add workflow to strip template boilerplate from PR bodies#130
ci: add workflow to strip template boilerplate from PR bodies#130binaryaaron wants to merge 1 commit intomainfrom
Conversation
c5a9a0d to
4483b38
Compare
The PR template uses HTML comments and checklist sections for author guidance. GitHub's squash-merge copies the full PR body into the commit message, polluting git history with template boilerplate. This workflow fires once when a PR is opened, extracts only the Summary section content (plus git trailers), and writes the cleaned body back. The template stays intact in the editor when creating the PR; after submission, the body shows only what matters for the commit message. Tested the extraction regex against all 11 affected commits in the repo's history. Handles: multi-line HTML comments, inline `<!-- -->` in code spans (preserved), `# Summary` and `## Summary` headings, trailer deduplication, and GitHub separator lines. Signed-off-by: aagonzales <aagonzales@nvidia.com> Made-with: Cursor
4483b38 to
b60cc88
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a GitHub Actions workflow that automatically cleans PR bodies when PRs are opened. The workflow strips HTML comments and template boilerplate sections, keeping only the Summary section content plus git trailers (Signed-off-by, Co-authored-by). This prevents GitHub's squash-merge from polluting commit messages with template markup.
Changes:
- Adds
.github/workflows/clean-pr-body.ymlworkflow that triggers on PR open events - Implements JavaScript-based text processing to extract Summary sections and git trailers
- Updates PR bodies via GitHub API to remove template boilerplate
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
The copyright header format differs from other workflow files in this repository. Other workflow files use the standard Apache 2.0 header with # comment style, not the SPDX-FileCopyrightText format. For consistency, this should follow the same format as files like ci-checks.yml, copyright-check.yml, and conventional-commit.yml.
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. |
| let cleaned = summary; | ||
| if (trailers.length) cleaned += '\n\n' + trailers.join('\n'); | ||
|
|
||
| if (cleaned !== body.trim()) { |
There was a problem hiding this comment.
The comparison cleaned !== body.trim() may cause unnecessary API calls when the body has no trailing whitespace but the workflow still reformats it. The issue is that cleaned is built from summary.trim() which removes trailing whitespace, but this is compared against body.trim(). If the original body has leading/trailing whitespace differences or internal whitespace variations (e.g., multiple newlines collapsed to double), this comparison will always be false even when semantically equivalent. Consider comparing both after normalizing whitespace, or compare against the original body without trimming.
| if (cleaned !== body.trim()) { | |
| if (cleaned.replace(/\s+$/, '') !== body.replace(/\s+$/, '')) { |
| } | ||
|
|
||
| // 3. Collect unique git trailers from the full body. | ||
| const trailerRe = /^(Signed-off-by|Co-authored-by):\s+.+$/gm; |
There was a problem hiding this comment.
The trailer regex on line 48 only matches Signed-off-by and Co-authored-by trailers, but Git and GitHub support additional trailers like Reviewed-by, Tested-by, Acked-by, and others. Consider whether these should also be preserved. If they should be excluded intentionally, the current implementation is correct, but if they should be preserved, the regex should be expanded to match all common Git trailers.
| await github.rest.pulls.update({ | ||
| owner, | ||
| repo, | ||
| pull_number: pr_number, | ||
| body: cleaned, | ||
| }); | ||
| core.info(`Cleaned PR #${pr_number} body to Summary + trailers`); |
There was a problem hiding this comment.
The API call to update the PR body (line 70) has no error handling. If the API call fails (e.g., due to rate limiting, network issues, or permission problems), the workflow will fail silently or with an unclear error. Consider wrapping this in a try-catch block and providing informative error messages, or at least logging when the update succeeds vs. fails.
| await github.rest.pulls.update({ | |
| owner, | |
| repo, | |
| pull_number: pr_number, | |
| body: cleaned, | |
| }); | |
| core.info(`Cleaned PR #${pr_number} body to Summary + trailers`); | |
| try { | |
| await github.rest.pulls.update({ | |
| owner, | |
| repo, | |
| pull_number: pr_number, | |
| body: cleaned, | |
| }); | |
| core.info(`Cleaned PR #${pr_number} body to Summary + trailers`); | |
| } catch (error) { | |
| const message = error && error.message ? error.message : String(error); | |
| core.error(`Failed to clean PR #${pr_number} body: ${message}`); | |
| core.setFailed(`Failed to update PR body for #${pr_number}`); | |
| } |
| const trailerRe = /^(Signed-off-by|Co-authored-by):\s+.+$/gm; | ||
| const seen = new Set(); | ||
| const trailers = []; | ||
| for (const [line] of stripped.matchAll(trailerRe)) { |
There was a problem hiding this comment.
The code destructures only the first element from the match array on line 51: for (const [line] of stripped.matchAll(trailerRe)). However, matchAll returns match objects where the full match is at index 0, and capture groups follow. Since the regex /^(Signed-off-by|Co-authored-by):\s+.+$/gm has one capture group, match[0] is the full match (the entire trailer line), and match[1] is the captured trailer type. Using [line] gets the full match, which is correct. However, it would be clearer to use [fullLine] or match[0] to make the intent more obvious.
| // 4. Remove trailers and GitHub separator lines from summary text. | ||
| summary = summary | ||
| .split('\n') | ||
| .filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && l.trim() !== '---------') |
There was a problem hiding this comment.
The separator line filter on line 61 only checks for lines that are exactly --------- after trimming. However, GitHub and Markdown may use different numbers of dashes for horizontal rules (typically 3 or more dashes: ---, ----, etc.). This overly specific check may not catch all separator variations. Consider using a regex like /^-{3,}$/ to match any line with 3 or more dashes, or clarify why only 9 dashes specifically should be removed.
| .filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && l.trim() !== '---------') | |
| .filter(l => !l.match(/^(Signed-off-by|Co-authored-by):\s+/) && !l.trim().match(/^-{3,}$/)) |
| name: Clean PR Body | ||
|
|
||
| on: | ||
| pull_request: |
There was a problem hiding this comment.
The workflow triggers on pull_request with type opened and requires pull-requests: write permission. For PRs from forks, the workflow will run with the fork's GITHUB_TOKEN, which typically does not have write permissions to the base repository. This means the workflow will fail when trying to update the PR body for fork-based PRs. Consider using pull_request_target instead, which runs in the context of the base repository and has write permissions. However, be cautious with pull_request_target as it poses security risks if user-controlled code is executed. In this case, since the workflow only reads the PR body and updates it via the GitHub API (no code checkout or execution), pull_request_target should be safe.
| pull_request: | |
| pull_request_target: |
| // preserving inline references like `<!-- -->` in code spans. | ||
| let stripped = body.replace(/^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm, ''); |
There was a problem hiding this comment.
The HTML comment stripping regex /^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm has an issue: it will strip comments that appear at the start of a line within code blocks. For example, in a Markdown code block showing HTML/XML examples, a line starting with <!-- would be removed even though it's legitimate content, not template boilerplate. The regex should avoid matching inside code fences (delimited by triple backticks). Consider processing the body to skip code blocks, or use a more sophisticated approach that preserves code block contents.
| // preserving inline references like `<!-- -->` in code spans. | |
| let stripped = body.replace(/^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm, ''); | |
| // preserving inline references like `<!-- -->` in code spans, | |
| // and avoiding matches inside fenced code blocks (``` ... ```). | |
| const commentRe = /^[ \t]*<!--[\s\S]*?-->[ \t]*\n?/gm; | |
| const segments = body.split(/(```[\s\S]*?```)/); | |
| let stripped = segments | |
| .map((segment, idx) => { | |
| // Captured fenced code blocks appear at odd indices and start with ``` | |
| if (idx % 2 === 1 && segment.startsWith('```')) { | |
| return segment; | |
| } | |
| return segment.replace(commentRe, ''); | |
| }) | |
| .join(''); |
Summary
The PR template uses HTML comments and checklist sections for author guidance. GitHub's squash-merge copies the full PR body into the commit message, polluting git history with template boilerplate.
This workflow fires once when a PR is opened, extracts only the Summary section content (plus git trailers), and writes the cleaned body back via the GitHub API. The template stays intact in the editor when creating the PR; after submission, the body shows only what matters for the commit message.
The extraction regex was tested against all 11 affected commits in the repo history. It handles: multi-line HTML comments, inline
<!-- -->in code spans (preserved),# Summaryand## Summaryheadings, trailer deduplication, and GitHub separator lines.Test plan
<!-- -->in code spans is not stripped