What is a 'package' ? #359
Replies: 3 comments 5 replies
-
Duplicate of #257? |
Beta Was this translation helpful? Give feedback.
-
@JimFuller-RedHat I would not want to wake up the spec lawyer in you, as this feels intimidating 😊 ! ... now to me, a PURL is a URL to locate, and also to identify. So, what's a package? I like to think as the package "type" or ecosystem as defining a package "protocol", i.e., a number of conventions, formats, network protocols, and specs that help us name, locate, fetch, reuse, and interact with code packaged for this purpose by that ecosystem. As an example for Python, this encapsulate a large number of things that collectively form the "pypi" "protocol", eventually documented in multiple PEPs, PyPA documents, down to unwritten tribal knowledge:
So here "pypi" means all this, and a package is in this context would be:
The same reasoning would extend to other types.
And is a github or gitlab repo a package?
And of course, because all abstractions eventually leak, we have an escape hatch with a "generic" type and a few other attempts, but these are really hatches and best used sparingly. I hope this helps. We could and should draft some intro blurb for the spec, as the id vs. locate and the "what's a package" question are things that come up often. |
Beta Was this translation helpful? Give feedback.
-
Ah sorry not trying to instill anxiety ;) ... pURL has done a fine job avoiding spec churn (and I certainly do not want to encourage any with 'rabbit hole' conversations on esoterica ) ... I do think it is worthwhile to come up with a stable definition of what a pURL describes. A pURL is used to identify a software package (with the 'boundary' of software package as the subject of this discussion) - I am not aware of any parsers, libraries or apps directly de-refencing the following:
Parsing most pURL with trurl fails because it works only with hierarchical urls: ... though if we add the notorious '//' (ignore this, trurl prob needs to add a flag for this!!):
another more valid example:
I use trurl as the gold standard for what a valid URL is - once you start digging into URLs they become hideously complex (looking at you WHATWG) ... ex. pURL pushes complexity of utf8 and IRI aside in a faustian exchange for rules on percent encoding. So using trurl now is not exactly right, but when we get an IANA registry entry and we investigate semantics we might find some of the current rules of pURL invalidates it being a true URL (in the URL valid sense). Why the pedantry, well because it may have implications for developers implementing tooling that uses, parses, generates pURL. To be able to be used as a locator the pkg: scheme probably need to have some defined protocol semantics ... though we would hope a pURL, armed with knowledge of type (and implied protocol), identifies a software package 'coordinates' to be able to de-reference from somewhere but in reality (and in the wild) we have a mixture of incomplete and concrete pURLs, for example:
What do any of the above 'mean' ? How would a consumer know when a pURL is referring to a single 'bag o bits' ? A URL can of course represent a list of resources or a single resource - such ambiguity in the spec can be a good thing but might be worth explicitly defining (like saying its up to the package ecosystem to define its meaning). For example - the above pURLs might be equivalent and predicate a notion of aliases (the world of HTTP offers 'food for thought' with their HTTP redirect semantics, url shorteners and more) ... I do not think the pURL spec has to define aliasing but we should try to draw a boundary around what a pURL describes to contain it. In this instance it might be useful to explicitly state in the spec what we want it to mean (ex. a single software package, a set of software package, both, etc ...). The spec defines all pURLs MUST use pkg: scheme (which needs an IANA registry entry) and then mentions about not being any special schemes or other schemes (those bullet points seem superfluous and could be removed). The definition of pkg: scheme implies that most semantics are contextually dependent on pURL types ... though that document does not define if any qualifiers are required/optional. The definition of a software package as an entity that participates in a system with an algebraic set of operations on data defining a package protocol which is mainly concerned with distribution (rewrote your succinct definition!) seems reasonable though it implies: If software package identity is a 'side effect' of the software component participation in a package ecosystem then I (maybe wrongly) interpret this to mean that a software package does not yet 'exist' until it is a member within an ecosystem. In the past, we may have opted to identify a software component based on build characteristics, for example I could build curl with:
and we might propagate these as qualifiers in some identity strategy ... there is still ambiguity in that at runtime exactly which code paths get 'lit up' is heavily dependent on execution env context. In maven world this can be even more complicated eg. reviewing what is running in the JVM at runtime vs defined at build vs install time. The point is that this kind of information (we hope) is indirectly introspected via relationship of pURL to package ecosystem though identifying a concrete component at runtime seems outside of scope (as one does not consult a package manager to tell you about runtime identity of a software component). It seems like we cannot reliably mint pURL on anything outside that definition - how to identify software before we formalise in a package distribution system ... or how to identify software that has deprecated/expired and out of distribution ? There is software that has no package management but still needs identity... perhaps we should explicitly state such things are out of scope for pURL ? In the wild, we ab/use generic pURL type - which invalidates the package ecosystem definition of a software package ... if we define a pURL as identifying a 'things' existence within a package ecosystem, then define an 'escape hatch' which allows for defining everything ... not sure about that. So how do we identify software:
Summary: if we use the following definition:
Then we might want to address some of the points I raise above. I am also hopeful we could do better then the generic type 'band aid'. If pURL as locator is a goal we need to tighten up type definitions though not so convinced that is as important as its usage as a unique identifier. Untangling what (and if) pURL describes at build, package/distribute and runtime is a challenge. |
Beta Was this translation helpful? Give feedback.
-
No where in the spec (or README) do we explicitly define what a package is ... of course we might infer:
If this is enough, perhaps we define the term in the spec ?
I see the sense of defining package management as a boundary that makes things tractable though it might be useful to provide advice for how to handle things outside of package management. Interested in others thoughts...
Beta Was this translation helpful? Give feedback.
All reactions