Skip to content

Conversation

@jonathantanmy2
Copy link
Collaborator

Currently, this PR just has a demonstration of a bug. Once I've confirmed my understanding of the situation, I'll update this PR to contain a fix.

@vercel
Copy link

vercel bot commented Nov 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
gitbutler-web Ignored Ignored Preview Nov 27, 2025 4:03am

@Byron
Copy link
Collaborator

Byron commented Nov 26, 2025

Thanks @jonathantanmy2, this is incredibly observant! There are probably many bugs and not enough tests, so this definitely is a step in the right direction.

In any case, @mtsgrd might be able to comment on it, maybe showing a path forward as well.

On another note, something I think this needs (at some time) is a better way to test it, and of course, a port to new non-legacy data structures. Ideally, this becomes plumbing that doesn't know workspaces at all, which should also improve its testability.

The `line_shift` value is used when combining diff hunks from two or
more stacks together, and has an important invariant: at any point
where a diff hunk from another stack could be inserted, the cumulative
`line_shift` value must be the net lines (lines added less lines
removed) of all the diff hunks prior to that point.

This is so that the combiner knows how to shift the diff hunks. For
example, suppose the combiner needs to combine two stacks; the second
stack has a diff that adds a line at line 100. Suppose the combined
effect of all diff hunks in the first stack prior to line 100 is a net
reduction of 42 lines: the total `line_shift` value of all those diff
hunks must thus be -42, so that the combiner knows that the change that
the second stack has must be added at line 58 instead of line 100.

Point A: Note that this invariant only needs to apply at any point
where a diff hunk from another stack could be inserted. If, in a
stack, there are two hunks adjacent to each other, no diff hunk may be
inserted between them, so as long as their total `line_shift` value is
correct, they may have any `line_shift` value they want. (In fact, it is
sometimes not possible to determine what each `line_shift` value should
be.)

This invariant is not met by the current algorithm, so I switched the
calculation for something that does. The main principles are:

 - If applying a hunk causes other hunks to completely disappear, the
   incoming hunk must bear responsibility for the `line_shift` values of
   the hunks that disappear by adding their values to itself.

 - If a hunk splits into two (only possible when applying a hunk in the
   middle of an existing hunk), the two resulting hunks must split the
   original `line_shift` value between them. Due to Point A above, the
   exact proportion does not matter (the two resulting hunks sandwich
   the hunk that split them, and all three are adjacent to each other),
   so I have chosen to distribute the `line_shift` value based on their
   sizes.

 - If applying a hunk causes another hunk to be reduced in size, but
   not completely disappear, the exact distribution of `line_shift`
   values in between these two hunks does not really matter, since they
   are adjacent (and thus Point A applies). But I have chosen to take
   some `line_shift` from the reduced-size hunk to give to the incoming
   hunk, analogous to how a completely disappearing hunk cedes all its
   `line_shift` to the incoming hunk, to make the `line_shift` values
   more reasonable (well, reasonable to me, at least).

There is some code duplication due to how the diff hunks of an
individual stack are combined. I thought of rewriting the combining
algorithm before writing this commit (to reduce or eliminate the
code duplication needed), but there were some inconsistencies in how
zero-line hunks were handled, so I thought it best to correct the
`line_shift` issue before making further changes.
@jonathantanmy2 jonathantanmy2 changed the title Demonstration of line_shift bug in but-hunk-dependency Fix line_shift bug in but-hunk-dependency Nov 27, 2025
@jonathantanmy2
Copy link
Collaborator Author

This PR now contains the fix. PTAL. See the commit message for a more complete description of the problem and the fix.

@Byron
Copy link
Collaborator

Byron commented Nov 27, 2025

I thought I could skip the review if @mtsgrd is going to take a much more proficient look anyway, but I just wanted to welcome our first "Git level" commit message , which I reproduce here for ease of consumption:


The line_shift value is used when combining diff hunks from two or more stacks together, and has an important invariant: at any point where a diff hunk from another stack could be inserted, the cumulative line_shift value must be the net lines (lines added less lines removed) of all the diff hunks prior to that point.

This is so that the combiner knows how to shift the diff hunks. For example, suppose the combiner needs to combine two stacks; the second stack has a diff that adds a line at line 100. Suppose the combined effect of all diff hunks in the first stack prior to line 100 is a net reduction of 42 lines: the total line_shift value of all those diff hunks must thus be -42, so that the combiner knows that the change that the second stack has must be added at line 58 instead of line 100.

Point A: Note that this invariant only needs to apply at any point where a diff hunk from another stack could be inserted. If, in a stack, there are two hunks adjacent to each other, no diff hunk may be inserted between them, so as long as their total line_shift value is correct, they may have any line_shift value they want. (In fact, it is sometimes not possible to determine what each line_shift value should be.)

This invariant is not met by the current algorithm, so I switched the calculation for something that does. The main principles are:

  • If applying a hunk causes other hunks to completely disappear, the incoming hunk must bear responsibility for the line_shift values of the hunks that disappear by adding their values to itself.

  • If a hunk splits into two (only possible when applying a hunk in the middle of an existing hunk), the two resulting hunks must split the original line_shift value between them. Due to Point A above, the exact proportion does not matter (the two resulting hunks sandwich the hunk that split them, and all three are adjacent to each other), so I have chosen to distribute the line_shift value based on their sizes.

  • If applying a hunk causes another hunk to be reduced in size, but not completely disappear, the exact distribution of line_shift values in between these two hunks does not really matter, since they are adjacent (and thus Point A applies). But I have chosen to take some line_shift from the reduced-size hunk to give to the incoming hunk, analogous to how a completely disappearing hunk cedes all its line_shift to the incoming hunk, to make the line_shift values more reasonable (well, reasonable to me, at least).

There is some code duplication due to how the diff hunks of an individual stack are combined. I thought of rewriting the combining algorithm before writing this commit (to reduce or eliminate the code duplication needed), but there were some inconsistencies in how zero-line hunks were handled, so I thought it best to correct the line_shift issue before making further changes.


Thank you @jonathantanmy2 , this really is something I'd like to copy. And maybe one day GitButler will also be the tool that helps to unearth such messages when people are wondering why it is what it is (-> Git archaeology)

@krlvi krlvi requested a review from mtsgrd November 27, 2025 21:59
@krlvi
Copy link
Member

krlvi commented Nov 27, 2025

(adding Mattias as a reviewer since he authored the original implementation and may remember certain nuances about the functionality)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants