Fix: Inspector reports should link to CVEs (#6557) #6562

achave11-ucsc · 2024-09-11T18:17:56Z

Connected issues: #6557

Checklist

Author

PR is a draft
Target branch is develop
Name of PR branch matches issues/<GitHub handle of author>/<issue#>-<slug>
On ZenHub, PR is connected to all issues it (partially) resolves
PR description links to connected issues
PR title matches¹ that of a connected issue _{or comment in PR explains why they're different}
PR title references all connected issues
For each connected issue, there is at least one commit whose title references that issue

¹ when the issue title describes a problem, the corresponding PR
title is Fix: followed by the issue title

Author (partiality)

Added p tag to titles of partial commits
This PR is labeled partial _{or completely resolves all connected issues}
This PR partially resolves each of the connected issues _{or does not have the partial label}

Author (chains)

This PR is blocked by previous PR in the chain _{or is not chained to another PR}
The blocking PR is labeled base _{or this PR is not chained to another PR}
This PR is labeled chained _{or is not chained to another PR}

Author (reindex, API changes)

Added r tag to commit title _{or the changes introduced by this PR will not require reindexing of any deployment}
This PR is labeled reindex:dev _{or the changes introduced by it will not require reindexing of dev}
This PR is labeled reindex:anvildev _{or the changes introduced by it will not require reindexing of anvildev}
This PR is labeled reindex:anvilprod _{or the changes introduced by it will not require reindexing of anvilprod}
This PR is labeled reindex:prod _{or the changes introduced by it will not require reindexing of prod}
This PR is labeled reindex:partial and its description documents the specific reindexing procedure for dev, anvildev, anvilprod and prod _{or requires a full reindex or carries none of the labels reindex:dev, reindex:anvildev, reindex:anvilprod and reindex:prod}
This PR and its connected issues are labeled API _{or this PR does not modify a REST API}
Added a (A) tag to commit title for backwards (in)compatible changes _{or this PR does not modify a REST API}
Updated REST API version number in app.py _{or this PR does not modify a REST API}

Author (upgrading deployments)

Ran make docker_images.json and committed the resulting changes _{or this PR does not modify azul_docker_images, or any other variables referenced in the definition of that variable}
Documented upgrading of deployments in UPGRADING.rst _{or this PR does not require upgrading deployments}
Added u tag to commit title _{or this PR does not require upgrading deployments}
This PR is labeled upgrade _{or does not require upgrading deployments}
This PR is labeled deploy:shared _{or does not modify docker_images.json, and does not require deploying the shared component for any other reason}
This PR is labeled deploy:gitlab _{or does not require deploying the gitlab component}
This PR is labeled deploy:runner _{or does not require deploying the runner image}

Author (hotfixes)

Added F tag to main commit title _{or this PR does not include permanent fix for a temporary hotfix}
Reverted the temporary hotfixes for any connected issues _{or the none of the stable branches (anvilprod and prod) have temporary hotfixes for any of the issues connected to this PR}

Author (before every review)

Rebased PR branch on develop, squashed old fixups
Ran make requirements_update _{or this PR does not modify requirements*.txt, common.mk, Makefile and Dockerfile}
Added R tag to commit title _{or this PR does not modify requirements*.txt}
This PR is labeled reqs _{or does not modify requirements*.txt}
make integration_test passes in personal deployment _{or this PR does not modify functionality that could affect the IT outcome}

Peer reviewer (after approval)

PR is not a draft
Ticket is in Review requested column
PR is awaiting requested review from system administrator
PR is assigned to only the system administrator

System administrator (after approval)

Actually approved the PR
Labeled connected issues as demo or no demo
Commented on connected issues about demo expectations _{or all connected issues are labeled no demo}
Decided if PR can be labeled no sandbox
A comment to this PR details the completed security design review
PR title is appropriate as title of merge commit
N reviews label is accurate
Moved connected issues to Approved column
PR is assigned to only the operator

Operator (before pushing merge the commit)

System administrator

Background migrations for dev.gitlab are complete _{or this PR is not labeled deploy:gitlab}
Background migrations for anvildev.gitlab are complete _{or this PR is not labeled deploy:gitlab}
PR is assigned to only the operator

Operator (before pushing merge the commit)

Operator (chain shortening)

Changed the target branch of the blocked PR to develop _{or this PR is not labeled base}
Removed the chained label from the blocked PR _{or this PR is not labeled base}
Removed the blocking relationship from the blocked PR _{or this PR is not labeled base}
Removed the base label from this PR _{or this PR is not labeled base}

Operator (after pushing the merge commit)

Operator (reindex)

Operator

Propagated the deploy:shared, deploy:gitlab, deploy:runner, API, reindex:partial, reindex:anvilprod and reindex:prod labels to the next promotion PRs _{or this PR carries none of these labels}
Propagated any specific instructions related to the deploy:shared, deploy:gitlab, deploy:runner, API, reindex:partial, reindex:anvilprod and reindex:prod labels, from the description of this PR to that of the next promotion PRs _{or this PR carries none of these labels}
PR is assigned to no one

Shorthand for review comments

L line is too long
W line wrapping is wrong
Q bad quotes
F other formatting problem

coveralls · 2024-09-11T21:58:13Z

coverage: 85.398%. remained the same
when pulling e2e5569 on issues/achave11-ucsc/6557-link-CVEs
into 74c4894 on develop.

codecov · 2024-09-11T22:04:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.38%. Comparing base (74c4894) to head (e2e5569).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #6562   +/-   ##
========================================
  Coverage    85.38%   85.38%           
========================================
  Files          155      155           
  Lines        20754    20754           
========================================
  Hits         17720    17720           
  Misses        3034     3034

Flag	Coverage Δ
	`85.38% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dsotirho-ucsc · 2024-09-12T16:49:47Z

scripts/export_inspector_findings.py

+        findings_vuln_sorted = {vuln: findings[vuln] for vuln in sorted(findings)}
+        for vulnerability, summaries in sorted(findings_vuln_sorted.items(),


This double sorting doesn't work since the second sort jumbles up the first sort's results. Also, the current sorting is flawed in that it would sort ['CVE-2024-500', 'CVE-2024-2000', 'CVE-2024-90'] alphanumerically (2000, 500, 90) instead of numerically (90, 500, 2000).
Consider that findings_sort() is already doing a secondary sort by returning a tuple (score, vulnerability_name). This could be modified to sort by (score, int(vulnerability_number)) to achieve a secondary sort using the numeric part of the vulnerability.

Here's a proof of concept:

Index: scripts/export_inspector_findings.py IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/scripts/export_inspector_findings.py b/scripts/export_inspector_findings.py --- a/scripts/export_inspector_findings.py (revision 255666ceff1d07ea89e2943e57fe6b30e9bfdd48) +++ b/scripts/export_inspector_findings.py (date 1726158979848) @@ -10,6 +10,9 @@ import json import logging import sys +from typing import ( + Any, +) from furl import ( furl, @@ -173,13 +176,17 @@ cols = list(chars) + [a + b for a in chars for b in chars] return cols[col - 1] - def findings_sort(self, item: tuple[str, list[SummaryType]]) -> tuple[int, str]: + def findings_sort(self, item: tuple[str, list[SummaryType]]) -> tuple[int, tuple[Any, ...]]: score = 0 weights = {'HIGH': 1, 'CRITICAL': 10} for summary in item[1]: count = len(summary['resources']) score += count * weights.get(summary['severity'], 0) - return score, item[0] + name_parts = item[0].split('-') + if len(name_parts) == 3 and name_parts[0] == 'CVE': + return score, (int(name_parts[1]), int(name_parts[2])) + else: + return score, (item[0],) def write_to_csv(self, findings: dict[str, list[SummaryType]], @@ -195,10 +202,11 @@ lookup = dict(zip(titles, range(len(titles)))) rows = [titles] - findings_vuln_sorted = {vuln: findings[vuln] for vuln in sorted(findings)} - for vulnerability, summaries in sorted(findings_vuln_sorted.items(), + for vulnerability, summaries in sorted(findings.items(), key=self.findings_sort, reverse=True): + # FIXME: Delete this debug print + print(vulnerability, [s['severity'] for s in summaries]) # A mapping of column index to abbreviated severity value column_values = { lookup[key]: summary['severity'][0:1]

dsotirho-ucsc

Consider storing the URL in the summary dictionary instead of creating a separate vulnerability_links dictionary.

Since the script groups findings by vulnerability, and there is one URL provided per finding, it is possible that one vulnerability will have more than one unique URL. In this case I think it makes sense to use the most common URL for a given vulnerability rather than the first (or last) URL encountered. The patch below uses this approach.

Index: scripts/export_inspector_findings.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/scripts/export_inspector_findings.py b/scripts/export_inspector_findings.py
--- a/scripts/export_inspector_findings.py	(revision 8c8fcc00e7961fd3899e8943e25b9d7a7a66c3a4)
+++ b/scripts/export_inspector_findings.py	(date 1726699545451)
@@ -3,6 +3,7 @@
 vulnerability.
 """
 from collections import (
+    Counter,
     defaultdict,
 )
 import csv
@@ -11,10 +12,6 @@
 import logging
 import sys
 
-from furl import (
-    furl,
-)
-
 from azul.args import (
     AzulArgumentHelpFormatter,
 )
@@ -119,13 +116,11 @@
         if self.args.json:
             self.dump_to_json(findings)
         parsed_findings = defaultdict(list)
-        vulnerability_links = defaultdict(furl)
         for finding in findings:
-            vulnerability, source_url, summary = self.parse_finding(finding)
-            vulnerability_links[vulnerability].url = source_url
+            vulnerability, summary = self.parse_finding(finding)
             parsed_findings[vulnerability].append(summary)
         log.info('Found %i unique vulnerabilities', len(parsed_findings))
-        self.write_to_csv(parsed_findings, vulnerability_links)
+        self.write_to_csv(parsed_findings)
         log.info('Done.')
 
     def dump_to_json(self, findings: JSONs) -> None:
@@ -134,7 +129,7 @@
         with open(output_file_name, 'w') as f:
             json.dump({'findings': findings}, f, default=str, indent=4)
 
-    def parse_finding(self, finding: JSON) -> tuple[str, str, SummaryType]:
+    def parse_finding(self, finding: JSON) -> tuple[str, SummaryType]:
         severity = finding['severity']
         # The vulnerabilityId is usually a substring of the finding title (e.g.
         # "CVE-2023-44487" vs"CVE-2023-44487 - google.golang.org/grpc,
@@ -145,9 +140,9 @@
         assert len(finding['resources']) == 1, finding
         resource = finding['resources'][0]
         resource_type = resource['type']
-        source_url = finding['packageVulnerabilityDetails']['sourceUrl']
         summary = {
             'severity': severity,
+            'source_url': finding['packageVulnerabilityDetails']['sourceUrl'],
             'resource_type': resource_type,
             'resources': set(),
         }
@@ -165,7 +160,7 @@
             self.instances.add(instance)
         else:
             assert False, resource
-        return vulnerability, source_url, summary
+        return vulnerability, summary
 
     def column_alpha(self, col: int) -> str:
         assert col > 0, col
@@ -188,9 +183,7 @@
             finding_name = finding_name.replace(id, padded_id)
         return score, finding_name
 
-    def write_to_csv(self,
-                     findings: dict[str, list[SummaryType]],
-                     vulnerability_links: dict[str, furl]) -> None:
+    def write_to_csv(self, findings: dict[str, list[SummaryType]]) -> None:
         titles = [
             'Vulnerability',
             'Severity',
@@ -214,7 +207,11 @@
             row_num = len(rows) + 1
             col_range = f'C{row_num}:{last_col}{row_num}'
             severity_formula = f'=(COUNTIF({col_range},"C")*10)+(COUNTIF({col_range},"H"))'
-            url = vulnerability_links[vulnerability].url
+            urls = Counter([summary['source_url'] for summary in summaries])
+            if len(urls.keys()) > 1:
+                log.warning('More than one URL found for %s, using most common', vulnerability)
+                log.warning(dict(urls.most_common()))
+            url = urls.most_common(1)[0][0]
             vulnerability_hyperlink = f'=HYPERLINK("{url}","{vulnerability}")'
             row = [vulnerability_hyperlink, severity_formula]
             for column_index in range(len(row), len(titles) + 1):

dsotirho-ucsc · 2024-09-18T21:07:43Z

scripts/export_inspector_findings.py

@@ -169,12 +176,21 @@ def column_alpha(self, col: int) -> str:
    def findings_sort(self, item: tuple[str, list[SummaryType]]) -> tuple[int, str]:
        score = 0
        weights = {'HIGH': 1, 'CRITICAL': 10}
-        for summary in item[1]:
+        finding_name, summaries = item


Suggested change

finding_name, summaries = item

vulnerability, summaries = item

dsotirho-ucsc · 2024-09-18T21:10:35Z

scripts/export_inspector_findings.py

+        if finding_name.startswith('CVE-'):
+            # Best secondary-sorting effort on CVE findings, vulnerability names
+            # not prefixed with 'CVE' may reflect an inaccurate secondary order.
+            id = finding_name.rsplit('-', 1)[1]


Suggested change

id = finding_name.rsplit('-', 1)[1]

id = finding_name.split('-')[-1]

dsotirho-ucsc

padded_id can be inlined.

            prefix, _, id = vulnerability.rpartition('-')
            vulnerability = '-'.join([prefix, f'{id:0>6}'])

dsotirho-ucsc · 2024-09-20T16:17:28Z

scripts/export_inspector_findings.py

+            # Best secondary-sorting effort on CVE findings, vulnerability names
+            # not prefixed with 'CVE' may reflect an inaccurate secondary order.
+            id = vulnerability.split('-')[-1]
+            padded_id = '000000'[:abs(6 - len(id))] + id


Suggested change

padded_id = '000000'[:abs(6 - len(id))] + id

padded_id = f'{id:0>6}'

With abs:

>>> for id in ['1234', '12345', '123456', '1234567', '12345678']: ... print('000000'[:abs(6 - len(id))] + id) 001234 012345 123456 01234567 0012345678

With max:

>>> for id in ['1234', '12345', '123456', '1234567', '12345678']: ... print('000000'[:max(0, 6 - len(id))] + id) 001234 012345 123456 1234567 12345678

With f-string:

>>> for id in ['1234', '12345', '123456', '1234567', '12345678']: ... print(f'{id:0>6}') 001234 012345 123456 1234567 12345678

Also, (not needed if you use the f-string approach, but) remember a string can be multiplied by a number:

>>> '0' * 5 '00000'

Good call on the f-string approach.
Thus far I've only seen 5 digits as part of the CVE ID which comes after the year, which is why I added six 0's (one extra for good measure).

dsotirho-ucsc · 2024-09-20T16:26:10Z

scripts/export_inspector_findings.py

+        if vulnerability.startswith('CVE-'):
+            # Best secondary-sorting effort on CVE findings, vulnerability names
+            # not prefixed with 'CVE' may reflect an inaccurate secondary order.
+            id = vulnerability.split('-')[-1]


Suggested change

id = vulnerability.split('-')[-1]

prefix, _, id = vulnerability.rpartition('-')

Reverted to my original approach, I think it's more deliberate in intend.

dsotirho-ucsc · 2024-09-20T16:26:37Z

scripts/export_inspector_findings.py

+            # not prefixed with 'CVE' may reflect an inaccurate secondary order.
+            id = vulnerability.split('-')[-1]
+            padded_id = '000000'[:abs(6 - len(id))] + id
+            vulnerability = vulnerability.replace(id, padded_id)


Suggested change

vulnerability = vulnerability.replace(id, padded_id)

vulnerability = '-'.join([prefix, padded_id])

This doesn't quite work, since it omits "year" aspect of the CVE name which also needs to be considered for the secondary sort.

github-actions bot added the orange [process] Done by the Azul team label Sep 11, 2024

achave11-ucsc force-pushed the issues/achave11-ucsc/6557-link-CVEs branch 3 times, most recently from 1a818a8 to 255666c Compare September 11, 2024 21:44

achave11-ucsc requested a review from dsotirho-ucsc September 11, 2024 22:14

achave11-ucsc assigned dsotirho-ucsc Sep 11, 2024

dsotirho-ucsc requested changes Sep 12, 2024

View reviewed changes

dsotirho-ucsc removed their assignment Sep 12, 2024

achave11-ucsc force-pushed the issues/achave11-ucsc/6557-link-CVEs branch from 255666c to 3481d61 Compare September 17, 2024 20:43

achave11-ucsc requested a review from dsotirho-ucsc September 17, 2024 21:12

achave11-ucsc assigned dsotirho-ucsc and unassigned dsotirho-ucsc Sep 17, 2024

achave11-ucsc force-pushed the issues/achave11-ucsc/6557-link-CVEs branch 3 times, most recently from 4fc72d1 to 8c8fcc0 Compare September 18, 2024 03:11

achave11-ucsc assigned dsotirho-ucsc Sep 18, 2024

dsotirho-ucsc requested changes Sep 18, 2024

View reviewed changes

dsotirho-ucsc removed their assignment Sep 18, 2024

achave11-ucsc force-pushed the issues/achave11-ucsc/6557-link-CVEs branch from 8c8fcc0 to 7c057cf Compare September 19, 2024 17:10

achave11-ucsc requested a review from dsotirho-ucsc September 19, 2024 17:34

achave11-ucsc assigned dsotirho-ucsc Sep 19, 2024

dsotirho-ucsc requested changes Sep 20, 2024

View reviewed changes

dsotirho-ucsc removed their assignment Sep 20, 2024

achave11-ucsc added 2 commits September 20, 2024 17:52

Fix: Inspector reports should link to CVEs (#6557)

b22ba9e

fixup! Fix: Inspector reports should link to CVEs (#6557)

e2e5569

achave11-ucsc force-pushed the issues/achave11-ucsc/6557-link-CVEs branch from 7c057cf to e2e5569 Compare September 21, 2024 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Inspector reports should link to CVEs (#6557) #6562

Fix: Inspector reports should link to CVEs (#6557) #6562

achave11-ucsc commented Sep 11, 2024 •

edited

Loading

coveralls commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 11, 2024 •

edited

Loading

dsotirho-ucsc Sep 12, 2024

dsotirho-ucsc left a comment

dsotirho-ucsc Sep 18, 2024

dsotirho-ucsc Sep 18, 2024

dsotirho-ucsc left a comment

dsotirho-ucsc Sep 20, 2024

achave11-ucsc Sep 21, 2024

dsotirho-ucsc Sep 20, 2024

achave11-ucsc Sep 21, 2024

dsotirho-ucsc Sep 20, 2024

achave11-ucsc Sep 21, 2024

		findings_vuln_sorted = {vuln: findings[vuln] for vuln in sorted(findings)}
		for vulnerability, summaries in sorted(findings_vuln_sorted.items(),

	finding_name, summaries = item
	vulnerability, summaries = item

	id = finding_name.rsplit('-', 1)[1]
	id = finding_name.split('-')[-1]

	padded_id = '000000'[:abs(6 - len(id))] + id
	padded_id = f'{id:0>6}'

	id = vulnerability.split('-')[-1]
	prefix, _, id = vulnerability.rpartition('-')

	vulnerability = vulnerability.replace(id, padded_id)
	vulnerability = '-'.join([prefix, padded_id])

Fix: Inspector reports should link to CVEs (#6557) #6562

Are you sure you want to change the base?

Fix: Inspector reports should link to CVEs (#6557) #6562

Conversation

achave11-ucsc commented Sep 11, 2024 • edited Loading

Checklist

Author

Author (partiality)

Author (chains)

Author (reindex, API changes)

Author (upgrading deployments)

Author (hotfixes)

Author (before every review)

Peer reviewer (after approval)

System administrator (after approval)

Operator (before pushing merge the commit)

System administrator

Operator (before pushing merge the commit)

Operator (chain shortening)

Operator (after pushing the merge commit)

Operator (reindex)

Operator

Shorthand for review comments

coveralls commented Sep 11, 2024 • edited Loading

codecov bot commented Sep 11, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

dsotirho-ucsc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsotirho-ucsc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achave11-ucsc commented Sep 11, 2024 •

edited

Loading

coveralls commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 11, 2024 •

edited

Loading