-
-
Notifications
You must be signed in to change notification settings - Fork 237
Add support for parsing Git commit messages #1992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: ziad hany <[email protected]>
06e8b82
to
976c5ad
Compare
Add a test for CollectRepoFixCommitPipeline Signed-off-by: ziad hany <[email protected]>
Signed-off-by: ziad hany <[email protected]>
|
||
def clone(self): | ||
"""Clone the repository.""" | ||
self.repo_url = "https://github.com/torvalds/linux" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should not be static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ziadhany, see some suggestions.
cmd = [ | ||
"git", | ||
"clone", | ||
"--bare", | ||
"--filter=blob:none", | ||
"--no-checkout", | ||
self.repo_url, | ||
repo_path, | ||
] | ||
subprocess.run(cmd, check=True) | ||
self.repo = Repo(repo_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already using gitpython let's use the same for cloning too.
cmd = [ | |
"git", | |
"clone", | |
"--bare", | |
"--filter=blob:none", | |
"--no-checkout", | |
self.repo_url, | |
repo_path, | |
] | |
subprocess.run(cmd, check=True) | |
self.repo = Repo(repo_path) | |
self.repo = Repo.clone_from( | |
url=self.repo_url, | |
to_path=repo_path, | |
bare=True, | |
no_checkout=True, | |
multi_options=["--filter=blob:none"] | |
) |
self.log("Generating AdvisoryData objects from grouped commits.") | ||
grouped_commits = self.collect_fix_commits() | ||
for vuln_id, commits in grouped_commits.items(): | ||
references = [ReferenceV2(url=f"{self.repo_url}/commit/{cid}") for cid, _ in commits] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we storing proper fix commit? just keeping it in reference is not sufficient IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keshav-space I was relying on this pipeline CollectFixCommitsPipeline to create fix commits , the issue is that CommitFixV2
is currently tied to both affected_package and advisory. IMO, storing fix commits as references can be useful.
summary_lines = [f"- {cid}: {msg}" for cid, msg in commits] | ||
summary = f"Commits fixing {vuln_id}:\n" + "\n".join(summary_lines) | ||
yield AdvisoryData( | ||
advisory_id=vuln_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be problematic since we intend to collect fixed commits from multiple different repo here. Suppose we get a fix commit for CVE-000-000
in two different repo, we will end up with a conflict while inserting the advisory, as we use advisory_id
prefixed with the pipeline_id
to create unique AVID. In this case we will end up with same AVID for fix commits imported from two different git repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keshav-space Not sure what the best solution for this is, but based on my understanding, we can make the pipeline_id dynamic
django_fix_commit
django_restframework_fix_commit
For example:
avid: "e.g., django_fix_commit/PYSEC-2020-2233"
avid: "e.g., django_restframework_fix_commit/PYSEC-2020-2233"
This will generate a different avid
for each Git repository.
I’m not sure if this completely solves the problem, though.
def clean_downloads(self): | ||
"""Cleanup any temporary repository data.""" | ||
self.log("Cleaning up local repository resources.") | ||
if os.path.isdir(self.repo.working_tree_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working_tree_dir
is not what we need here this will return None
as we are working on bare repository, see https://gitpython.readthedocs.io/en/stable/reference.html#git.repo.base.Repo.working_tree_dir.
We should instead use working_dir
.
if os.path.isdir(self.repo.working_tree_dir): | ||
shutil.rmtree(path=self.repo.working_tree_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's make sure that we have local clone before proceeding to remove.
if os.path.isdir(self.repo.working_tree_dir): | |
shutil.rmtree(path=self.repo.working_tree_dir) | |
if hasattr(self, 'repo') and self.repo.working_dir: | |
shutil.rmtree(path=self.repo.working_dir) |
Signed-off-by: ziad hany <[email protected]>
I created an initial script to parse Git commit messages that can be easily integrated with our model. The script takes a Git repository as input, parses all commits, and returns the CVEs along with their corresponding fixed commits.
Issues:
results:
openssl.json