[August]: Useful rev-spec
parsing and an abstraction for remotes
#541
Byron
announced in
Progress Update
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This month felt a little slow in times and getting into the 'groove' of handling git remotes took me longer than I had hoped. Let's see if feelings and facts line up.
rev-spec resolution with perfect disambiguation
The parser emits callbacks to a delegate which can maintain all the state required to handle typical revision specification parsing, namely:
Disambiguation
Disambiguation is an interesting topic as I didn't even see it at first, after all it's such a rarely used feature at least in 'typical' repositories. Thus I shouldn't have been surprised when I realized that
git
does a lot to assure disambiguation is happening and helpful, culminating in a nice listing of ambiguous objects should it fail.For
gitoxide
to handle this at least as well,git-odb
had to learn how to provide ambiguous objects to the caller instead of just baking general information into the return value. With that in place and test scenarios lifted from the git test suite (how else would one generate commits with a 000000 prefix?) it was possible to make the entire resolution engine aware of ambiguities from the ground up. And the algorithm is really simple: run the entire rev-spec against all candidates and if only one candidate remains, there is no ambiguity. This is whygitoxide
is able to resolve rev-specs that are ambiguous forgit
, probablybecause
git
has a lot of 'special cases' for that without disambiguation being built-in like it is ingitoxide
.Ranges
Ranges are interesting because a single rev-spec can actually contain of up to two specifications, combining both in a range. Building range support into the parser turned out to be a good idea as it made additional information useful for… disambiguation… easy to obtain and a natural part of the process. In other word, if a range is present, we know we are looking for commit objects.
Helpful Errors
Error handling is important and they are naturally split into parser errors, somewhat lower level, and errors happening during resolution. The goal here was to not degenerate error information so it maintains a list of errors which are chained together before presenting them to the user. This was the only way I saw to present them easily even though a typical application will show them as a chain of errors that caused each other, which is not actually correct. However, thanks to special casing of …disambiguation… it was possible to produce nice and helpful errors nonetheless that are typically far better than what git produces - no surprise there.
Testing
With something as complex as rev-parsing, it was clear that
git
needed to be consulted to create a baseline. This baseline is pre-produced and cached on disk, with the results only being parsed from a simple text file, making them available to the tests almost instantly without having to executegit
again.That way I could make sure that
gitoxide
agrees with all results thatgit
produced, except for when it is better.Regex support
Erm, what? Yes,
rev-specs
can use regexes to do a fuzzy-search on commits.gitoxide
can do that too using the exemplaryregex
crate should it be compiled in - it's optional, or just doing a sub-string search otherwise. Thanks to this, we can actually compare the performance of both implementations as these searches can be …exhaustive. More on that in a bit.Completeness
Nearly all of the various transformations are implemented, with a few postponed to later once certain information they require is more clear to me. Nonetheless, I believe even now it's possible to resolve 95% of all rev-specs correctly.
All in all, it's already something I use as
git rev-parse
replacement.gix rev parse
commandTo make rev-parsing more accessible I put it into
gix rev parse
, which can also--explain
what the spec does (a feature I found very helpful) or--cat-file
right away because more often than note you want to plugrev-parse
intocat-file
, at least that's what I end up doing.What's notable is its performance, as when constructing worst-case scenarios with its regex support one can quickly see that in some cases,
gitoxide
finishes twice as fast. This is mostly due to it being able to traverse commits about 10% faster thangit
withregex
being much faster than whatevergit
is using. Mentioning performance here is nothing more than a side-note though as most of the time with typical rev-specs, it won't matter at all.The flattening of
gix repo
With
gix
gaining more and more commands that are benefiting fromgit-repository
, it seemed only reasonable to make these commands more accesible as well by removing therepo
sub-command and folding all commands that need a repository to the top-level. This also means that all commands that where there previously have now been placed under theno-repo|free
sub-command. These of course still have their use but will be generally be more niche than what's now on the top-level. Definitely a win for ergonomics.complete
ref-spec
parsingNote the
f
instead of thev
, asgitoxide
can now parse reference specifications as they are used in git remotes. Working on this was interesting as I realized how little I knew, and how ambiguous/flexible these specs are. This made didn't affect the difficulty of parsing them, but it made it harder to understand all the valid state one had to consider when interpreting them. It didn't help that ref-specs for fetching aren't the same as those for pushing.Many tests with
git
-baseline later I managed to boil all valid states into anInstruction
enum along with all the information I could gather as accompanying documentation, as not everything seemed feasible or worth it to put into the type system. As of now, thisInstruction
type is still not used for interpretation but I am optimistic that it's suitable enough to implement ref-spec matching correctly.The
Remote
abstraction with ref listing, also in 'async'Probably it was this topic that made me feel I was slow moving this month. Part of it was definitely interesting as I could finally learn how remotes work, and how that relates to the remote configuration of branches in interplay with remotes themselves.
gitoxide
now makes it easy to get remotes based on the current head, as it implements the logic thatgit
itself uses.gix remote refs
Listing and finding remotes was quite straightforward to implement, but what was hard(er) at least in my head was making them do something useful. After all, what one wants is to connect to a url and start communicating with the remote side. The easiest interaction in the book is listing remote references, a capability that was present via
gix free remote ref-list <url>
, which is now available viagix remote refs
.What made this harder what the requirement of supporting both blocking and async APIs which makes some sense for any kind of networked IO. It's not about performance at all, but about possibilities. Thank's to
gitoxide
s async support it's easy and straightforward to support frameworks likequinn
for example to getquic
support for connection with custom servers, which happens to be async only.git-config
improvementsWhat's cool is that in order to replace the previous free command (
gix free remote ref-list
) with the new one I had to implement a way to set the transport protocol version. A first naive approach was to just add a-p
flag, but it felt wrong in the light ofRemote::connect(…)
reading theprotocol.version
configuration variable. Wouldn't it be nice if one could set it on the command-line?This required
gitoxide
to allow changing the configuration in memory viaRepository::config_snapshot_mut()
, which was trivial to implement due to the advanced state ofgit-config
, butgit
had a surprise for me in the shape of a rule that I wasn't really aware of: booleans without an=
sign are considered true and those with an equal sign but an empty value are considered false. Of course I baked this into the test for the feature that applies key-value pairs from the command-line to the configuration just to see that the test would not succeed: booleans worked a little differently.Fortunately, after an hour of debugging, I was relieved by finding the exact right place to make booleans work while keeping the API consistent - the logic employed by
git-config
now feels right(er) and the last special case I was aware of related to how it stores keys and their values is much better understood.Community
A slightly troubled
starship
starship
, a project with ~28k stars and probably a lot of installations usesgitoxide
since v1.10, and with such a massive install base it's no surprise that not everything worked perfectly. Something worth highlighting is that this leads togitoxide
being executed each time a developer presses enter in their terminal, so the amount of different configurations, machines and git repositories the code runs in must be enormous! But I digress :D.The worst issue was a hang that would always happen if
starship
was executed on a machine with less than 6 usable cores, like, any laptop with 4 cores and hyper-threading disabled. It's probably even worse than it sounds. The cause was ultimately boiled down to this issue, which was entirely in the dependency chain ofgitoxide
and once again flagsjwalk
in combination withrayon
as troublesome.jwalk
is used for parallel file system traversal, and it hangs due to an interaction withrayon
if it runs within apar_iter()
, something thatstarship
happens to do. The overarching issue is thatgitoxide
now can become a hazard for any application that happens to run it within an outerpar_iter()
, and for now the only fix is forgitoxide
to allow disabling parallel iteration injwalk
using a feature toggle. I truly hope this can be fixed at the root though as I am sure this will bite again in the future.Index V2 and V3 writing
Sidney has contributed index writing, a vital step when cloning repositories with a work-tree. It turned out to be a little harder than anticipated, but by now its robustly writing indices in V2 or V3 format along with the
tree
extension, the extension most important to quickly generate a tree from an index.Thank you!
git_date::Time
formattingThanks to Svetlin there now is
git_date::Time::format(…)
which allows to use custom formats of thetime
crate along with many of the pre-defined formats available ingit log --date <format>
. It's a tiny, but powerful API and as always, there is much more work than anticipated and two additional standard formats are still to be implemented.docs.rs crates with feature labels
Thanks to a contribution Poliorcetics the next time
gitoxide
gets released, there will be labels indicating the feature toggle that needs to be present for modules or methods to be available. Great work there, there is a ton I don't know about the Rust ecosystem and I am always happy to learn some new tricks :).CI upgrades and performance improvements
Also thanks to Poliorcetics, the CI workflows have been updated to the latest versions of the actions they use, and that motivated me to try to improve their worst-case performance. It feels good to see CI get better, even though I'd love to make it faster, too :D.
Rust Foundation sponsorship: cargo shallow clones update
This section I want to keep in all sponsor updates moving forward to the end of the year.
All work on Remotes is directly motivated by
cargo
, ascargo
uses anonymous remotes to perform its operations.gitoxide
supports these too now, and I aim my work towards achieving bare clones of repositories so the crates.io index can be cloned usinggitoixde
. This shouldn't take too long as there is only about 5 weeks left to integrate it intocargo
. Fortunately I have found a good channel to communicate with the cargo team (Zulip), which allowed to clarify the next step, so all that's left is to get the actual (and a lot of) work done.It's all very exciting and I can't wait to open the first PR in
cargo
- in September I should have a lot more to write about that.Cheers,
Sebastian
PS: The latest timesheets can be found here.
Beta Was this translation helpful? Give feedback.
All reactions