Description
In purescript/registry#76 (comment) we figured that using SemVer's "prerelease segment" for Trustees to publish versions, as the way the spec orders them is not the one we'd like.
I'll report here the discussion from the thread, as a starting point for further discussion:
@hdgarrood: I'm not sure using the semver prerelease segment will work so well for this. Firstly, prerelease versions are considered less than non-prerelease versions according to semver, so v1.0.0 compares greater than v1.0.0-r1. Secondly, doesn't this prevent package authors from using the prerelease segment for their own purposes? I wouldn't really mind if we reserved the prerelease segment for our own use, but in that case we would be diverging from semver (if we don't allow authors to use all of the components of semver, we aren't really using semver) so I think we ought to be more upfront about that if that's what we go with.
@f-f: Great points. I wouldn't like us to diverge from SemVer, but other than that I have no strong opinion on how to do this.
Do you have a concrete idea that we could write down here?
@hdgarrood: I just looked over the semver spec again and it does say that prerelease identifiers and build metadata are optional so actually I take that back, we are well within our right to only accept what the spec describes as "normal" versions into the registry (i.e. those which don't have prerelease identifiers or build metadata), and since it's quite rare to use prerelease identifiers and build metadata in practice (at least for libraries), I think it may even be a good idea to reject versions which use them. I also think it would be nice to set the registry architecture up so that it's impossible for revisions to affect anything other than the package metadata. What Hackage do is handle revisions of metadata for a version separately from the version tarball itself, so that there's only ever one tarball for a version, and revisions of the metadata are stored separately. It then becomes the job of the registry client to fetch the most recent version of the metadata alongside the package tarball. That approach sounds sensible to me, and I think it suggests that metadata revisions are separate from package versions. So maybe this is just a question for the package index?
@f-f: I am not quite comfortable with Hackage's approach of storing package sources and metadata revisions separately, as I think it has a couple of problems:
- there is only one tarball for a version, which means that "hashing the package sources" doesn't guarantee integrity and security by itself, as you'd need to fetch a metadata file as well, and you'd need to guarantee integrity (i.e. hash it and distribute the hashes) either of this file or of the whole bundle. In the latter case, one then also needs to define how to compose the files to compute the hash. Preserving integrity of the metadata as well is necessary because things like "changing version bounds" could introduce malicious code if not supervised, and not guaranteeing integrity for them would mean that people would get them in their build believing that the version hasn't changed (because the hash did not), rendering builds nondeterministic and insecure.
- doing this would mean that the
registry-index
repo would become a source of truth, which means that we'd either have to maintain these two repos in sync (e.g. the moment we add metadata in there we also need to sync the list of versions here, etc) or unify them- ...but would storing metadata in here mean that packages wouldn't version that? If metadata is not stored alongside the package sources, then there would be no way to use a package without going through the Registry (and I can see some people not wanting to do that because of security or other company concerns), as they'd have to figure out how the package is defining dependencies, bounds, etc. If we store the metadata alongside the sources instead, then we have the question of "which metadata is correct, the one here or the one in the registry?". This last reason alone is why the current design ditches this aspect (i.e. storing manifests in here and optionally in packages) from the previous draft.
About prerelease identifiers: some packages have been using the pre-release segment (e.g. Halogen or aff) and I think it's good to allow them to, as it has an important role in package versioning.
I also read again through the SemVer spec, and found an issue which might actually help us here: it looks like it considers two version with everything equal but the build metadata as equals in the sorting. I.e. it ignores build metadata when sorting. This is a problem for us because we need get a stable sorting of releases (to figure out the last version of a package), so we would either have to:
- disallow releases with build metadata altogether
- or reserve the build metadata segment only for Trustees to cut new revisions (so going back to that idea from the previous version of the draft), and expand the sorting so that it would consider version with build metadata as "later" than without. The ordering of different build metadata segments should be no problem, as the spec already defines how to sort prerelease segments, and their grammar is the same, so we could just reuse that.
This behaviour seems to be whatapt
does as well - distros patch upstream packages and add build metadata so that the package manager picks the patched versions as newer - and I think it works very well thereNote that both of the above options means that we'll slightly diverge from SemVer, but I'd consider this quite fine, since it's basically just getting rid of undefined behaviour.
- Aren't we intending to provide access to individual package manifest files without downloading a whole package tarball anyway, via the package index? Ensuring the integrity of those package manifest files is already a problem we'll need to deal with, surely?
- In that case, could we store package manifest files in the storage backend alongside the package tarballs and call that the source of truth for them? Then, the package index would always be derived from the storage backend, and would not be a source of truth?
- I think disallowing build metadata makes sense. We could also require that each package may only upload one version with the same major, minor, patch, and prerelease components. For example, if you've already uploaded
1.0.0+abc
, then I think the registry should reject a subsequent upload of1.0.0+def
. I think defining an ordering for the build metadata would be a more serious violation of the semver spec, because the spec specifically says that you mustn't do that. If we go against this, I think it is likely to cause funny behaviour in clients which implement the semver spec accurately: for example, you couldn't have versions1.0.0+abc
and1.0.0+def
in a Set together. The only way I can make sense of the build metadata ordering requirement from the semver spec is if package registries should be refusing package uploads which differ from an existing version only in the build metadata component, i.e. uploading two different versions with the same major, minor, patch, and prerelease components should be disallowed.
@f-f: How would Trustees publish revisions if we disallow build metadata? And why would sorting by that a "serious violation"? SemVer doesn't say "you shouldn't do that because it's bad", it just says "we don't do that in SemVer".
Registry clients are supposed to implement a spec that we define here. If we say "it's SemVer plus ordering by build metadata", then that is the spec.
@hdgarrood: By treating revisions as a separate thing from package versions? It’s a more serious violation in my mind because it’s not just filling a gap in the spec, it’s going against something the spec explicitly says. The versioning libraries that exist aren’t “semver plus build metadata,” they’re just semver.