Skip to content

Join and PartialJoin traits#20

Open
nothingmuch wants to merge 6 commits into
mainfrom
lattice-partial
Open

Join and PartialJoin traits#20
nothingmuch wants to merge 6 commits into
mainfrom
lattice-partial

Conversation

@nothingmuch

@nothingmuch nothingmuch commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Introduces Join and PartialJoin, traits for combining values that respect commutativity, associativity, and idempotence.

This allows collections of values to be merged, tracking conflicts between them.

Join::join is total (infallible), PartialJoin::try_join returns a JoinResult. JoinResult implements Join. PartialJoin::wrap wraps self in Ok injecting or "lifting" it into the JoinResult space, which is essentially a completion of join by way of a formal product: the Conflict struct just accumulates arguments to join when one does not exist in the partial definition.

Lifting to the result type is more convenient since a total join can be used to reduce items in any order and obtain the least upper bound of all of the inputs, which in this case may be a conflict.

@nothingmuch nothingmuch force-pushed the lattice-partial branch 2 times, most recently from 43b2c19 to e295fb3 Compare May 26, 2026 17:14
@nothingmuch nothingmuch marked this pull request as ready for review May 26, 2026 20:44
@nothingmuch nothingmuch changed the title Lattice partial Join and PartialJoin traits May 26, 2026

@bc1cindy bc1cindy left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, utACK

accumulating conflicts into a set instead of picking a value arbitrarily is the right call

@arminsabouri arminsabouri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialJoin and Join definitions looked sane. Conflicts are still tripping me up.
What is the flow? if there is a conflict between two PSBTs and a third needs to be partially joined is it:

  1. Join all three in one operation and deal with the conflicts?
  2. Join the first two, resolve conflicts / or rejects. If Ok() then join the third?

The latter makes more sense to me. Curious to know your thoughts

Comment thread src/lattice/partial.rs Outdated
Comment thread src/lattice/partial.rs Outdated

@nothingmuch nothingmuch left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

for the record, i'm requesting changes on my own pull request.

Comment thread src/lattice/partial.rs Outdated
@nothingmuch nothingmuch force-pushed the lattice-partial branch 3 times, most recently from dc777b0 to d8edb43 Compare May 27, 2026 23:10
@nothingmuch nothingmuch requested a review from arminsabouri May 27, 2026 23:10
@nothingmuch

Copy link
Copy Markdown
Collaborator Author

Re-ack required, IntoIterator &Conflict implementation was a brain fart, not sure why my brain expected it to magically implement .iter() on the target type (but now that that exists, its implementation is in terms of .iter())

I also reordered some of the impl blocks, conslidated two of them by changing my mind for the 40th time about whether Conflict should require V : PartialJoin or not, added the giant rationale wall of text, and moved the provenance/ordering discussion to try_join.

@arminsabouri arminsabouri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK

Had a question about the use of wrap() in a unit test. Shouldn't be a blocker.

Comment thread src/lattice/partial.rs
Comment thread src/lattice/partial.rs
@nothingmuch nothingmuch force-pushed the lattice-partial branch 2 times, most recently from 89ed490 to 2a8d621 Compare May 27, 2026 23:56
Comment thread crates/concurrent-psbt/src/lattice/partial.rs
@nothingmuch nothingmuch force-pushed the lattice-partial branch 2 times, most recently from b050ba6 to 1345567 Compare May 28, 2026 00:54

@bc1cindy bc1cindy left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK

nit: 5575384 (commit message) n-ary join paragraph ends mid-sentence?

#### n-ary join

The final option considered was defining join not as a binary operation
but n-ary. This is no different than 

@nothingmuch nothingmuch force-pushed the lattice-partial branch 2 times, most recently from 25357ea to 9477920 Compare May 29, 2026 13:18
@nothingmuch nothingmuch force-pushed the lattice-partial branch 2 times, most recently from aac45b5 to c792f9a Compare May 29, 2026 16:29
Use cargo-llvm-cov nextest --no-tests=warn instead of the exit-status
fallback that silently swallowed build failures.
This is conditional on coverage_nightly cfg_attr (set by cargo llvm-cov
on nightly toolchains), this cfg_attr was allowed in nrrq
Join: infallible binary merge (semilattice operation). Implementors
must satisfy idempotent, commutative, and associative laws.

JoinMut: in-place variant; blanket impl provides Join automatically.

assert_join_laws! is a crate-internal macro (gated behind prop-tests
feature, scoped via #[macro_use]). Given an arbitrary strategy, it
generates three proptests verifying the semilattice laws for any
Join + Clone + PartialEq + Debug type. Used by downstream commits
to validate collection and domain-type Join implementations.
PartialJoin<V>::try_join returns JoinResult<V>, either Ok(v), where v
the least upper bound, or a Err(Conflict<V>) where the conflict contains
values that could not be joined.

Conflict<V> is a multiset with set-equality semantics (order-independent).
It implements JoinMut to merge conflict sets (union of distinct values).

JoinResult<V> itself implements Join:

  (Ok(a), Ok(b))   => a.try_join(b)      (delegate to PartialJoin on V)
  (Ok(v), Err(c))  => Err(c ∪ {v})       (absorb into conflict)
  (Err(c), Ok(v))  => Err(c ∪ {v})       (absorb into conflict)
  (Err(a), Err(b)) => Err(a ∪ b)         (join conflict sets)

Containers or product types that wrap their fields in JoinResult<V> can
implement Join recursively to compute the field-by-field merge without
early exit.

assert_partial_join_laws! is a crate-internal macro (gated behind
prop-tests feature, scoped via #[macro_use]). Given clean-value and
result strategies, it generates try_join law tests, JoinResult law tests
(via assert_join_laws!), and a wrap roundtrip test.

## Rationale for Conflict as flat, order preserving & order insensitive

Conflicts represent a formal completion of the partial semilattice `V :
PartialJoin`. If `a` and `b` are conflicts, commutativity requires that
`a.join(b) == b.join(a)`, and idempotence requires that `a.join(a) ==
a`. This is somewhat at odds with keeping track of where each
conflicting value originated from.

The main purpose of preserving the order is to allow provenance to be
tracked.

If `a` and `b` are both conflict free, and `let c = a.join(b)`, then
`c.conflicted_field.len() == 2` and the first value is from `a` whereas
the second is from `b`, which makes reporting this as an error with
clear diagnostics easier, without requiring the provenance be tracked by
some kind of surrogate ID.

This does not generalize to n > 2, because if
`a.join(b).join(c).some_conflicted_field.len() == 2`, the values could
originate from `(a, b)`, `(b, c)`, or `(a, c)`.

This compromise keeps the interface and implementation simple, allows
provenance to be tracked as long as it's done one pair of values at a
time in a straightforward way, but imposes no additional burdens on
users that do not care about provenance (for instance if computing
something like `vs.reduce(|a, b| a.join(b))`)

### Alternatives considered

Several alternative approaches were tried, of which the compromise of
making `Conflict` just a thin wrapper around Vec seemed the best.

#### HashSet or BTreeSet based

This alternative is very close to what is implemented. The differences
are that with a Vec, the order is preserved, the implementation of
equality and `join` has quadratic complexity. We expect `n` to be very
small so this shouldn't make a difference in practice.

Using a lookup based set requires `V : Hash` or `V : Ord` which the
current `Conflict` does not require (unfortunately adding it later would
be semver breaking, as would be changing the return value from `iter()`
or the associated `IntoIter` type of the `IntoIterator` impl).

#### Recursive data type

The following definition could in principle shadow the `join` structure:

```rs
enum Conflict<V> {
    Value(V),
    Pair([Box<Conflict>; 2])
}
```

In this case, if `a.join(b).join(c.join(d))` has 4 conflicting values,
they would take the form `Pair([ Pair([ Value(x), Value(y) ]), Pair([
Value(z), Value(w) ]) ])`, which is arguably more informative.
Unfortunately this is still imprecise because if `a.join(b).join(c)` has
a binary conflict `Pair([ Value(x), Value(y) ])`, it's still ambiguous
in the same way.

In order for this approach to be workable it has to shadow the syntax
tree of the join operation for the Ok branch too, in which case this
entire abstraction kind of only computing the transpose, going from e.g.
a list of structs with values, to a struct of lists of values, but not
reducing any of the complexity unless there are no conflicts anywhere.

The purpose of these abstractions is to take the problem of merging two
or more compound values into the a series of simpler problems, merging
two or more elements of a simpler type. Tracking provenance with perfect
fidelity means that if there is any conflict the structure is not
simplified at all.

#### n-ary join

The final option considered was defining join not as a binary operation
but n-ary. This is no different than the other options in terms of
expressive power, it could be just a simple transpose step followed by
some kind of collapse-compatible-values-to-LUB step. Presumably this
could be impl on `SomeNewType(Vec<Psbt>)`, with a transpose operation to
a `VecPsbt` (analog of `ResultPsbt`) which internally contains `Vec<>`
wrappers (instead of `JoinResult` wrappers) for each field. Then this
transposed structure would be joined field-wise, attempting to collapse
the n items to their LUB.

The current implementation can kinda represent this operation by mapping
*everything* to a conflict, joining and then reducing the conflicts, at
the cost of no longer having the special meaning of a unary conflict
representing a global invariant violation.

### Associated error type or generics

The above alternatives imply a "one size fits all" approach. However,
PartialJoin could have an Error type, where `JoinResult<V> = Result<V,
<V as PartialJoin>::Error>`.

Ostensibly this would allow some choice, but with associated types the
choice is fixed per implementation of the trait and so would not afford
users the choice of whether to opt out of provenance tracking for
simpler errors or opt in and deal with the added complexity.

Making the error type fully generic would make that possible with even
more complexity and syntactic overhead. However, no generality would be
gained for this additional complexity.

Thinking of Conflict<V> as just "deferred arguments for a join" (i.e. a
formal product), any arbitrary merge operation can be expressed by just
taking those arguments.

More formally, Conflict<V> is the free semilattice (sets under union)
over V. Since every semilattice is a quotient of the free semilattice,
so there is no operation that can be expressed by setting Error to some
type that merges V's according to some rules (e.g. taking the max of
integers) that can't be expressed by simply processing the conflict
after the fact.

### Conclusion

For these reasons, making Conflict a thin wrapper around `Vec` seems
like the best compromise: has the same expressive power but results in a
simpler interface than all the alternatives, and makes provenance
tracking possible and even relatively straightforward without forcing it
on all users.
@nothingmuch

nothingmuch commented May 30, 2026

Copy link
Copy Markdown
Collaborator Author

the more i think about it, the n-ary join approach might still be on the table. it has some nice properties.

api sketch

a struct Join<V>(Vec<V>) would be the core, it's basically Conflict.

at the bottom there will be a Value type, which implements the PartialJoin trait .

Vec<Constructor<Modifiable>> could then be wrapped by literally wrapping it in a Join.

next there would be a Join::transpose method, which converts a Join<V<U>> into a JoinV<U>, analogous to the ResultInput, ResultUnorderedPsbt, ResultConstructor etc types we have now.

the leaf level wrapping for this variant is to wrap in a unary Join struct, instead of wrapping in an Ok(x). these are then concatenated together, preserving indexes.

this would retain full provenance in an n-ary situation, independently of the conflicted status. missing values could be represented as Join(vec![]).

then Join::reduce() would actually perform the join operation, reducing it to wrapping the values, computing the join of that in terms of try_join.

this is where JoinResult and Conflict would enter the picture. every Join struct is either reduced to a single least upper bound value, or becomes a Conflict, which no longer has to implement Join to make sense, and therefore can be primarily concerned with tracking provenance.

should it be the case that Join(vec![a, b]).try_into::<V>() == Join(vec![a, b]).transpose().reduce()?

downsides

this adds extra steps but it seems like those steps could be ellided with a convenience API.

this sort of forces all of the implementations to be a bit more complex, but since they've already mostly been reduced to a single macro for product types and simple recursive ones for the collection types, this doesn't seem like a major downside.

not as direct an expression of order theoretical basis for this library, but a different structure whose correctness validity hinges on that basis (in the reduce() step PartialJoin could provide a pairwise implementation that still obeys the lattice laws)

@arminsabouri arminsabouri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK

re: n-ary vs binary join. I think n-ary makes more sense and removes the need for Conflict to impl join. It just becomes a means of reporting conflicts. How do you want to proceed?

@Mshehu5 Mshehu5 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK LGTM

@nothingmuch

Copy link
Copy Markdown
Collaborator Author

re: n-ary vs binary join. I think n-ary makes more sense and removes the need for Conflict to impl join. It just becomes a means of reporting conflicts. How do you want to proceed?

what n-ary can simplify is the recursive descent into the values, i.e. go from container types (collections & product types) to just primitive values that can conflict or not.

but when faced with a list of vluees and they are in conflict, it's very hard to convey what exactlyu the conflict is in a simple way. for n values there may be several subsets that are incompatible with each other, and actually making use of the fact that it's n-ary not binary requires sorting through them, or requires the eerror types to convey this.

we could reuse the Conflict value, and just put an Rc in every slot that refers to all of the conflicting slots, etc... but IMO this is a nightmare for what i expect is the most common usecase which is dealing with things pairwise (in the various interactive protocols, you deal with one message at a time and update the state if it works out or reject the message if it doesn't, leaving the state unmodified).

n-ary makes a bit more sense in the case where, in a cli, you deal with multiple files and there's no clear precedence order between them. a pairwise reduction might not give the complete information about everything one particular transaction conflicts with without trying to apply join but it's not clear that that is going to be more useful than just reporting the first error discovered.

i'm not entirely convinced this can't be improved or that binary is the way to go but it seems like the more conservative choice all things considered and we can always go back and replace it, mathematically the two models are entirely equivalent if we ignore the provenance question so it's not like one is more correct than the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants