-
Notifications
You must be signed in to change notification settings - Fork 203
Add a formal semver 2.0.0 version type #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-PR371-semver2.0
Are you sure you want to change the base?
Add a formal semver 2.0.0 version type #371
Conversation
First crack at adding a formal version type in response to CVEProject#362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
I recommend you resubmit the PR with a change in both It will be best to target a JSON schema validation instead of programmatically verifying versions when they are specific like this scenario with a clear semver-2.0.0 compliance being tested. Secondly, we should follow/extend the current schema model and extend it to satisfy this need instead of a completely new JSON schema fields like See the current versions.md document which has some examples https://github.com/CVEProject/cve-schema/blob/main/schema/docs/versions.md
The one we don't current have is the So your Example will actually look like
You need to build a JSON schema validator to work with such data, with versionType frozen with enum as |
Thank for the comment and I can update the json in this PR once we get to consensus 👍 With respect to the range fields themselves, after seeing you rewrite my example I think it makes sense to simplify and create new fields so that a parser doesn't need to implement conditional logic based on the combination of fields present. I think this will make for simpler and more maintainable code long term. Maybe more people can chime in on this point. As for the regex it looks like the one you're suggesting is the second of the two provided on semver.org. Albeit with a leading and trailing For documentation's sake here are the two
|
…for the expressions of "everything under X" or "everything over Y"
Had a thought hit me about one sided ranges, so I added two more examples
Which allow someone to express the idea of |
…-02-20. The status conversation will happen another day
@sei-vsarvepalli where does the |
The field |
Gotcha. Then I guess the difference between the two approaches in schema terms is to add a I've written a pretty simple parser in python for my proposal. It assumes perfect data (validated) and that the data is semver-2.0.0, but I think it gets the point across on the simplicity of parsing. Feel free to play around with it as well by changing the specific parameters in the test. I think I covered all the cases and it can probably be simplified further. import json
test_json_string = """
{
"versionType": "semver-2.0.0",
"status": "affected",
"exclusiveLowerBound": "1.2.3-alpha",
"inclusiveUpperBound": "2.3.4+build17"
}
"""
def parse_decoded_json(json):
if json.get("exactly"):
return f'= {json.get("exactly")}'
if json.get("inclusiveLowerBound"):
lower = f'{">= "+json.get("inclusiveLowerBound")}'
elif json.get("exclusiveLowerBound"):
lower = f'{"> "+json.get("exclusiveLowerBound")}'
else:
lower = ""
if json.get("inclusiveUpperBound"):
upper = f'{"<= "+json.get("inclusiveUpperBound")}'
elif json.get("exclusiveUpperBound"):
upper = f'{"< "+json.get("exclusiveUpperBound")}'
else:
upper = ""
return f'{lower}, {upper}'
the_json = json.loads(test_json_string)
print(parse_decoded_json(the_json)) I initially had lower = f'{">= "+json.get("inclusiveLowerBound") if json.get("inclusiveLowerBound") else "> "+ json.get("exclusiveLowerBound")}'
upper = f'{"<= "+json.get("inclusiveUpperBound") if json.get("inclusiveUpperBound") else "< "+ json.get("exclusiveUpperBound")}' However that doesn't handled one sided ranges and I wanted to get some code up before today's qwg meeting. I also haven't had time to make a complete comparison parser, but translating the section if json.get("exactly"):
return f'= {json.get("exactly")}' results in something that needs to look like if json.get("version") and (not json.get("lessThan") or not json.get("greaterThan") or not json.get("lessThanOrEqual")):
return f'= {json.get("version")}' as the code needs to be sure that the parameter |
@sei-vsarvepalli the new properties are in as of commit 62db169, however I'm not sure how to express the valid combinations of parameters for the semver 2.0.0 version type. Do I need to do something like a
Where the first option in the one of is the entire current payload and the other is the semver 2.0.0? Maybe you know a simpler approach? |
If this is valid then still need to ensure version type is set to semver-2.0.0 for these combinations
I let this stew for a bit and I think 046dadd is in the right direction. I think its possible to only allow those parameter combinations when the version type is semver 2.0.0, but not sure how to encode that yet. |
@sei-vsarvepalli Ok, so I'm trying to run the tests locally and it seems I need to rebuild However that file doesn't seem to reference the CVE schema file that I've been making edits to, so I'm a little confused how this all works for local testing. Am I missing something basic here? Am I editing the wrong file? |
What tests are you running? It looks like the starting point of your repo is Your JSON file is also mangled, the line 323 is missing a comma. When I run test against your branch I get this error
|
Thanks for pointing out the comma. Added that in. I'm trying to run the node validation suite with
Which made me think that the validation is failing to match a case on the versions section and hence looking into |
I'll get to the rest of the trailing commas later today. Thanks for the tool 👍
To address these point by point
|
For 'How would you feel about a construction where lower bounds always use lessThan/lessThanOrEqual and upper bounds always use greaterThan/greaterThanOrEqual?': I am opposed to this for the 5.x version series of the CVE Record Format. Consumers today can use the Writing a reference implementation of different behavior is not sufficient. If we change the behavior at some future point (in favor of greaterThanOrEqual or other new properties), then we need a communication plan that can effectively reach consumers, and a substantial period of time for consumers to adapt their use cases to new business logic. This would typically be announced as a substantial update, one with breaking changes for I believe the correct approach to SemVer is along the lines of what I originally suggested in 2023 at #263 - that:
I tried to extract every string from every current CVE Record that is intended to be a SemVer version number, and then I compared them to the SemVer 1 regular expression and to the SemVer 2.0.0 regular expression. The result was that 1.4% (see below) of these version numbers were valid for SemVer 1 but not valid for SemVer 2.0.0. A reasonable assessment is that SemVer 2.0.0 is sufficient for the community's needs, and should be what "semver" means in the CVE Record Format going forward. We should not be considering a complex and controversial semver-2.0.0 proposal to address a 1.4% case.
|
There was some discussion in the QWG today about a schema in which there was always a
With this schema, it is valid to write:
(i.e., no |
@ElectricNroff can we dig in on a concern I may not be fully understanding? Let's use the json you just posted
I see this as an encoding of the expression
Give me a bit more time to parse your longer post above. |
Yes, I want the Just having
if you ignore |
I think these two sentences are at odds with each other (at least as I read them). If the meaning of |
For your larger post, I think we touched on most of those points in the QWG meeting yesterday and noted that the concern is less about the small diff from semver 1 to semver 2 and more about the general inconsistency. Also curious where you found the semver 1 regex. If you think there's a point in there I'm glossing over that I shouldn't be or that wasn't addressed synchronously please call it out. For the new construction what do you think about this. We keep.
and in the case that
and I'll propose two special cases which would otherwise be invalid ranges so that we can capture
and
with the understanding that
and
This gives us all of our normal mathematical range tools aside from a non-inclusive lower bound With this construction the |
I don't think that typical SemVer implementations would consider it invalid to check whether an observed version number is less than 0.0.0. Consequently, they have no innate knowledge that a range that ends in 0.0.0 is invalid. They would just do the math as defined by the SemVer specification, and conclude that any observed version is simply not inside the range, Therefore, this is a breaking change because all such code would need to be changed. Here is one example of SemVer comparison within an Open Source vulnerability scanning product: |
Indeed it is atypical. It was designed to meet your requirement of the
They have no innate knowledge of anything today. Please see: #362
We have yet to define what is and is not a breaking change 👍. Please see #418 |
At the request of the QWG meeting today here are the other two constructions. For ease of readability I'll be breaking these into two posts. The first which was first introduced here e637776 and which was designed to be completely new so that existing parsers would be the least likely to misinterpret the new data. Discussed back around this comment #371 (comment) Five new properties are introduced. Which would allow the construction of The singleton
Two sided ranges
One sided ranges
The design is primarily for machines, but I think the wording choice also makes it easy for an uninitiated human with a basic mathematics education to understand the raw data in a pinch. The use of completely new properties is to avoid any interpretation conflict with existing parsers. The choice of breaking out |
The second choice which was introduced in a72e5b8 to "avoid bloat" by request Two new properties are introduced. Which are then used in expressions as
Two sided ranges
Two sided ranges with exclusive lower bounds were not implemented and it's unclear how to cleanly implement them with the restrictions that were imposed on this implementation. One could consider something like
and omit One sided ranges
In retrospect I view this construction as something of a halfway house and as such is my least favorable option. The third construction here #371 (comment) |
So in the interests of having current state at the end of this very long comment trail, what does that mean for the path forward? |
@andrewpollock That's a question for the QWG chair's @david-waltermire, @ccoffin, @MrMegaZone and potentially the board. I see Dave thumbs up'd the original construction |
How would I know which specific versions are between 2.0.0 and 2.5.7 using this approach? { |
They would be the versions |
Got it. So the software producer will know which versions are in the set, but how would a consumer know which versions are in the set. |
They would check for whichever version(s) of interest are relevant. Edit: If the question you're asking is more along the lines of
|
Got it, so if they are running version 2.3.4-beta this would be in the set because it's between 2.0.0 and 2.5.7 |
Right. By the rules of semver
after this PR merges at very least. I have no idea if this is even on their radar though to be honest. If there's a tooling related question |
@rjb4standards this is distinct from NIST's NVD search API. The Record Format (what we're discussing here), is managed by the CVE project. NVD, maintained by NIST, is a downstream consumer of CVE data. So even if/when this proposal for a new version type is added to CVE, it'll be up to NVD what to do about it. |
@alilleybrinker Thanks for clarifying. Does this mean the CVE Foundation will have a searchable API that supports the sem version range? For example, show me all the CVE's for ACME 2.3.4-beta? |
@rjb4standards the CVE Foundation is different from the CVE Project. As for what the CVE Project would do for improving search, I recommend talking to the Automation Working Group and/or the Consumer Working Group. You can find out more about the groups here: https://www.cve.org/ProgramOrganization/WorkingGroups |
@alilleybrinker thanks for clarifying. The CVE space is getting very confusing with the looming funding deadline coming fast. |
In today's QWG meeting, the group talked through the open options in detail, producing 7 possible options, which we've narrowed down to 5. We agreed that @darakian will produce a brief write-up of the 8 options, and then I will put together a ranked choice voting poll for those options, which will be distributed to QWG members. @ccoffin will also talk to the AWG to figure out what timeline may be doable for a 6.0.0 release, to inform the answer to the question of whether this change should be in a major or minor version. |
I'm not sure I understand why this PR is considered breaking in the light of the purl PRs being merged and scheduled for 5.2.0. My understanding leaving the qwg meeting today was that this PR added optional values to a required element and that the group considered such a change breaking. Is my understanding accurate? |
I'm out of office tomorrow but I can write up an answer for the reasoning on why I think this is breaking on Monday. |
Okay, here is my breakdown of why I think all three formulations of the proposal are breaking. This comes after discussion within the QWG, and is particularly influenced by @ElectricNroff's points on this topic. I'll break down my analysis by option. Note that all three proposals include the introduction of a new version type, called The While, in theory, the QWG could move forward with a proposal which only adopts the new The On a more general point, the analyses below are complicated by the fact that today, the CVE Record Format has no standardized versioning rules. In the future, with the adoption of SemVer for the Record Format itself, this kind of analysis will hopefully become much easier and clearer. Given the lack of agreed-upon stability standards, I am going to have to be conservative in considering how changes will impact stability expectations for CVE data providers (CNAs, ADPs, the Secretariat) and for CVE data consumers (vulnerability managers, third-party tool vendors, etc.). Option 1: Five new fieldsSee here: #371 (comment) This option proposes the introduction of five new fields into
While these would be optional new fields within the Consider for example, the following [
{
"versionType": "semver-2.0.0",
"inclusiveLowerBound": "1.0.0",
"exclusiveUpperBound": "3.0.0",
"status": "affected"
}
] Semantically, this array is equivalent to the following, with the existing non-validated and non-spec-compliant [
{
"versionType": "semver",
"version": "1.0.0",
"lessThan": "2.*",
"status": "affected"
}
] However, the construction with the newly-introduced fields would not be usable by an existing CVE consumer. All existing CVE consumers who handle the Given that CVE consumers would need to update their Note This distinction, that fields can be optional for CNAs to use, but may be semantically mandatory for CVE consumers to handle, is the central point that makes Options 1 and 2 breaking. It's also why the recently-merged support for Package URLs is not breaking. The new Option 2: Two new fieldsSee here: #371 (comment) This option proposes the introduction of two new fields into
While these are, like Option 1, optional new fields, they are also breaking for CVE consumers. Take the following example [
{
"versionType": "semver-2.0.0",
"greaterThanOrEqual": "1.0.0",
"lessThan": "2.0.0",
"status": "affected"
}
] This encodes the same range encoded in the example given for Option 1. Same as in that case, because of the missing Option 3: No new fieldsSee here: #371 (comment) This option does not propose the introduction of new fields into In this case, while no new fields are introduced, two new semantic overloads are proposed to enable addressing range types that would otherwise be impossible to express. First, to encode a range that is exclusive at its start and unbounded at its end: [
{
"versionType": "semver-2.0.0",
"version": "1.0.0",
"lessThan": "0.0.0",
"status": "affected"
}
] Note that this uses Second, to encode a range that is inclusive at its start and unbounded at its end: [
{
"versionType": "semver-2.0.0",
"version": "1.0.0",
"lessThanOrEqual": "0.0.0",
"status": "affected"
}
] This uses While this is a creative design which avoids the problems associated with the introduction of technically optional but semantically mandatory fields, it introduces new problems with the interpretation of existing fields. By introducing a new implicit meaning associated with the use of facially nonsensical values for Final NoteI wish these changes were not breaking. I want the CVE Record Format to be able to do what is proposed here: provide stronger data validation at submission-time, to make life easier and data more actionable for CVE consumers. That said, CVE has an enormous ecosystem of consumers, and breakage should not be done lightly, and it should not be done in a minor version. Even though the Record Format's 5.0.0 series has not officially adopted Semantic Versioning, I believe shipping a breaking change in 5.3.0 would violate CVE stakeholders' expectations of stability, and would be a mistake. |
I still disagree that this is breaking and I think the core of my disagreement is captured by
And in the three sub sections where it is asserted that consumers would need to update their parsing logic
As I read this the former asserts that CVE consumers must be ready to parse anything and the latter three assert that asking CVE consumers to parse one specific thing is asking too much. This is contradictory. I want to step back too and reassert that I don't really care about when (or if) the cve project wants to formalize semantic versioning. The point of this PR was to construct a minimum viable, meaningful improvement that I thought would be uncontroversial and to actually attempt the improvement process rather than to simply squawk about it. I do feel that there has been broad support for this change in details or at least in spirit and yet there has been a continual moving of goal posts on me from the use of an asterisk, to parameter names, to an assertion of What I do care about is that the cve project adopts a sane approach to data structures. I want the cve project to be introspective and self critical with an eye toward continual, iterative, non-ocean boiling improvement. I want the project to care about how it operates and how it affects the world. Speaking as a recipient of this projects work I want the project to own its outcomes and to improve. That all said, I also want to express that I have great respect for you as an individual Andy and just to be explicit the comments above are not directed at you the person, but at you the mitre. I want to be mindful of the position you (the person) are in, that the legacy of this project is not yours and that you've put in herculean effort in to help push this project forward. I am skeptical that a 6.0.0 will not be derailed in similar ways as with this PR, but given the undefined/underdefined nature of the current rule set it's clearly not worth continuing with the minor/major logic exercises. If we can define what a |
First crack at adding a formal version type in response to #362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic
Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
Another thought is that maybe this should be a retroactive definition of the
semver
type. That would likely be breaking for some of the current records though.The goal here is to have strict validation provided by cve services