Skip to content

Conversation

ziadhany
Copy link
Collaborator

@ziadhany ziadhany commented Aug 30, 2025

I created an initial script to parse Git commit messages that can be easily integrated with our model. The script takes a Git repository as input, parses all commits, and returns the CVEs along with their corresponding fixed commits.

Issues:

results:

Found 192 unique CVEs
{
  "CVE-2025-4575": [
    "https://github.com/openssl/openssl/commit/0eb9acc24febb1f3f01f0320cfba9654cf66b0ac",
    "https://github.com/openssl/openssl/commit/e96d22446e633d117e6c9904cb15b4693e956eaa"
  ],
  "CVE-2024-12797": [
    "https://github.com/openssl/openssl/commit/6ae8e947d8e3f3f03eeb7d9ad993e341791900bc",
    "https://github.com/openssl/openssl/commit/798779d43494549b611233f92652f0da5328fbe7",
    "https://github.com/openssl/openssl/commit/87ebd203feffcf92ad5889df92f90bb0ee10a699",
    "https://github.com/openssl/openssl/commit/738d4f9fdeaad57660dcba50a619fafced3fd5e9"
  ],
  "CVE-2024-13176": [
    "https://github.com/openssl/openssl/commit/2af62e74fb59bc469506bc37eb2990ea408d9467",
    "https://github.com/openssl/openssl/commit/07272b05b04836a762b4baa874958af51d513844",
    "https://github.com/openssl/openssl/commit/fcebf0a79a0a69f63721b66e94b01400a7de332e",
    "https://github.com/openssl/openssl/commit/78f6c35b83713d33b263fb85e3727543463d6fd5",
    "https://github.com/openssl/openssl/commit/77c608f4c8857e63e98e66444e2e761c9627916f",
    "https://github.com/openssl/openssl/commit/4b1cb94a734a7d4ec363ac0a215a25c181e11f65",
    "https://github.com/openssl/openssl/commit/392dcb336405a0c94486aa6655057f59fd3a0902",
    "https://github.com/openssl/openssl/commit/3fc4b112da2e2107a65ae2556fb6137098e08801",
    "https://github.com/openssl/openssl/commit/f15294228451217b5e58e2b7f5ad4c7a42303212",
    "https://github.com/openssl/openssl/commit/7d8a8c20e1370e43b0cad17e47a460a6f8e81a34",
    "https://github.com/openssl/openssl/commit/63c40a66c5dc287485705d06122d3a6e74a6a203",
    "https://github.com/openssl/openssl/commit/c3144e102571517df6c15ccc049fa3660ab3cb0a"
  ],

openssl.json

Add a test for CollectRepoFixCommitPipeline

Signed-off-by: ziad hany <[email protected]>

def clone(self):
"""Clone the repository."""
self.repo_url = "https://github.com/torvalds/linux"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should not be static

Copy link
Member

@keshav-space keshav-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ziadhany, see some suggestions.

Comment on lines 41 to 51
cmd = [
"git",
"clone",
"--bare",
"--filter=blob:none",
"--no-checkout",
self.repo_url,
repo_path,
]
subprocess.run(cmd, check=True)
self.repo = Repo(repo_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already using gitpython let's use the same for cloning too.

Suggested change
cmd = [
"git",
"clone",
"--bare",
"--filter=blob:none",
"--no-checkout",
self.repo_url,
repo_path,
]
subprocess.run(cmd, check=True)
self.repo = Repo(repo_path)
self.repo = Repo.clone_from(
url=self.repo_url,
to_path=repo_path,
bare=True,
no_checkout=True,
multi_options=["--filter=blob:none"]
)

self.log("Generating AdvisoryData objects from grouped commits.")
grouped_commits = self.collect_fix_commits()
for vuln_id, commits in grouped_commits.items():
references = [ReferenceV2(url=f"{self.repo_url}/commit/{cid}") for cid, _ in commits]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we storing proper fix commit? just keeping it in reference is not sufficient IMO.

Copy link
Collaborator Author

@ziadhany ziadhany Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space I was relying on this pipeline CollectFixCommitsPipeline to create fix commits , the issue is that CommitFixV2 is currently tied to both affected_package and advisory. IMO, storing fix commits as references can be useful.

summary_lines = [f"- {cid}: {msg}" for cid, msg in commits]
summary = f"Commits fixing {vuln_id}:\n" + "\n".join(summary_lines)
yield AdvisoryData(
advisory_id=vuln_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be problematic since we intend to collect fixed commits from multiple different repo here. Suppose we get a fix commit for CVE-000-000 in two different repo, we will end up with a conflict while inserting the advisory, as we use advisory_id prefixed with the pipeline_id to create unique AVID. In this case we will end up with same AVID for fix commits imported from two different git repos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space Not sure what the best solution for this is, but based on my understanding, we can make the pipeline_id dynamic

  • django_fix_commit
  • django_restframework_fix_commit

For example:

avid: "e.g., django_fix_commit/PYSEC-2020-2233"
avid: "e.g., django_restframework_fix_commit/PYSEC-2020-2233"

This will generate a different avid for each Git repository.
I’m not sure if this completely solves the problem, though.

def clean_downloads(self):
"""Cleanup any temporary repository data."""
self.log("Cleaning up local repository resources.")
if os.path.isdir(self.repo.working_tree_dir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working_tree_dir is not what we need here this will return None as we are working on bare repository, see https://gitpython.readthedocs.io/en/stable/reference.html#git.repo.base.Repo.working_tree_dir.
We should instead use working_dir.

Comment on lines 112 to 113
if os.path.isdir(self.repo.working_tree_dir):
shutil.rmtree(path=self.repo.working_tree_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, let's make sure that we have local clone before proceeding to remove.

Suggested change
if os.path.isdir(self.repo.working_tree_dir):
shutil.rmtree(path=self.repo.working_tree_dir)
if hasattr(self, 'repo') and self.repo.working_dir:
shutil.rmtree(path=self.repo.working_dir)

Signed-off-by: ziad hany <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants