Skip to content

feat(whatsapp): bump whatsapp-rust 0.5 -> 0.6 with LID-native addressing#1144

Open
juanlotito wants to merge 2 commits into
moltis-org:mainfrom
juanlotito:feat/bump-whatsapp-rust-943
Open

feat(whatsapp): bump whatsapp-rust 0.5 -> 0.6 with LID-native addressing#1144
juanlotito wants to merge 2 commits into
moltis-org:mainfrom
juanlotito:feat/bump-whatsapp-rust-943

Conversation

@juanlotito

Copy link
Copy Markdown

Summary

Bumps the whatsapp-rust family 0.5 → 0.6 and pins it to the merge commit of
oxidezap/whatsapp-rust#943 (DM LID-migration gate) via [patch.crates-io].

Why. whatsapp-rust 0.5 predates LID addressing. In practice that meant:

  • Inbound: after WhatsApp migrated a peer's device registry PN → LID,
    Signal sessions desynced — Bad Mac / No session loops and PDO recovery
    floods that only a re-pair fixed.
  • Outbound: DMs to @lid chats needed a local LID → PN rewrite hack in
    outbound.rs to be deliverable at all.

0.6 brings unified LID/PN addressing and migrates Signal sessions on LID
discovery, fixing both natively — the local rewrite hack is removed in this PR.

Why the rev pin. 0.6.0 alone regressed delivery for accounts the server
has not migrated to LID 1:1 (no lid_one_on_one_migration_enabled ab
prop): their LID-addressed DMs are silently 400-nacked (send() still returns
Ok). I hit this live, reported it in oxidezap/whatsapp-rust#941, and the fix
merged as oxidezap/whatsapp-rust#943 — the client now gates DM wire addressing
on the account's migration state automatically. No release includes it yet, so
the family is pinned to the #943 merge commit; the patch section documents that
it should collapse into a plain version bump on the next whatsapp-rust release.

Store/pairing work in this PR:

  • Persisted Device records migrate in place through a 0.5-layout shim, so
    existing WhatsApp pairings survive the upgrade (postcard is not
    self-describing and 0.6 appended trailing fields — decoding old records with
    the new layout otherwise fails and forces a re-pair).
  • Stores reworked for the 0.6 traits: typed [u8; 32] identities, Bytes
    sessions/prekeys, per-device sender-key tracking (replaces SKDM recipient
    lists + forget marks), delete_devices.
  • DeviceListRecord persists as JSON because its raw_id uses
    skip_serializing_if, which postcard cannot round-trip; device lists are
    cache and re-fetch on decode mismatch.
  • send_message returns SendResult, upload takes UploadOptions,
    DevicePropsOverride builder, Arc<Event> handlers, Jid.user is
    CompactString.
  • Toolchain nightly bump (0.6 uses if-let guards; the newer nightly also fixes
    a rustc query-depth ICE with matrix-sdk).

Validation

Completed

  • cargo fmt --all -- --check
  • cargo clippy -Z unstable-options --workspace --all-targets -- -D warnings
    (default features)
  • cargo clippy -Z unstable-options -p moltis-whatsapp --all-features --all-targets -- -D warnings
  • cargo nextest run --workspace (default features)
  • Live validation on a real companion-device pairing (home-server deploy):
    pairing from the 0.5-era store survived the upgrade in place, inbound
    and outbound DMs verified, 3/3 test DMs to a LID-mapped peer delivered
    with Delivered receipts and zero nacks — on an account the server has
    not LID-migrated, i.e. the exact population feat(web-ui): hide voice buttons when stt/tts disabled in config #943 protects.

Remaining

  • just lint / just test with --all-features — my local environment
    has no CUDA toolchain, so the cuda feature builds are left to CI
  • ./scripts/local-validate.sh <PR> once the PR number exists

Manual QA

  1. Run a gateway with an existing WhatsApp pairing created under 0.5 — it must
    connect without re-pairing (the shim migrates the persisted Device record
    on first read).
  2. Send a DM from the paired account to a contact whose chat is LID-addressed;
    confirm a Delivered receipt arrives and no ack error=400 appears with
    RUST_LOG=whatsapp_rust=debug.
  3. Have the contact reply; confirm the inbound message decrypts (no Bad Mac
    / No session).
  4. Restart the gateway; confirm reconnect + no store decode warnings.

whatsapp-rust 0.5 predates LID addressing: inbound Signal sessions desynced
after WhatsApp migrated a peer's device registry PN -> LID (Bad Mac /
No session loops, PDO recovery floods that only a re-pair fixed), and
outbound DMs to @lid chats needed a local LID -> PN rewrite hack to be
deliverable. 0.6 brings unified LID/PN addressing and migrates Signal
sessions on LID discovery, fixing both natively; the rewrite hack is gone.

0.6.0 alone regressed delivery for accounts the server has not migrated to
LID 1:1 (no lid_one_on_one_migration_enabled ab prop): their LID-addressed
DMs are silently 400-nacked while send() still returns Ok. Reported as
oxidezap/whatsapp-rust#941 and fixed upstream in moltis-org#943 (DM wire addressing
gated on the account's migration state), so the family is pinned via
[patch.crates-io] to the moltis-org#943 merge commit until a release includes it.

- Bump the whatsapp-rust family (wacore, wacore-binary, waproto,
  transports) 0.5 -> 0.6; pin nightly-2026-06-20 (0.6 uses if-let guards;
  the newer nightly also fixes a rustc query-depth ICE with matrix-sdk).
- Drop the outbound LID->PN rewrite (to_deliverable_jid): 0.6 addresses
  LID destinations natively and get_phone_number_from_lid is gone.
- Adapt to the 0.6 API: send_message returns SendResult, upload takes
  UploadOptions, DevicePropsOverride builder, Arc<Event> handlers,
  Jid.user is CompactString.
- Rework stores for the 0.6 traits: typed [u8; 32] identities, Bytes for
  sessions/prekeys, per-device sender-key tracking replaces SKDM
  recipients + forget marks, delete_devices.
- Migrate persisted Device records in place through a 0.5-layout shim so
  existing WhatsApp pairings survive the upgrade (postcard is not
  self-describing; 0.6 appended trailing fields). Device lists re-fetch
  on decode mismatch; they are cache. DeviceListRecord now persists as
  JSON because its raw_id field is skip_serializing_if, which postcard
  cannot round-trip.

Validated live on a companion-device pairing created under 0.5: the store
migrated in place (no re-pair), inbound and outbound DMs verified, and
3/3 DMs to a LID-mapped peer on an unmigrated account delivered with
Delivered receipts and zero nacks.
@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR bumps whatsapp-rust from 0.5 to 0.6 and pins to the post-#943 merge commit, which adds native LID/PN unified addressing and fixes Signal session desyncs and silent 400 nacks on LID-addressed DMs. The local LID→PN rewrite hack in outbound.rs is removed.

  • Store migration: LegacyDevice05 shim decodes old 0.5 postcard records and upgrades them in place; DeviceListRecord is persisted as JSON (not postcard) because its raw_id field uses skip_serializing_if; both paths include warn/debug logging from the previous review cycle.
  • Trait rework: SledStore and MemoryStore implement the 0.6 trait surface — typed [u8; 32] identities, Bytes sessions/prekeys, per-device sender-key tracking, delete_devices, SendResult, and UploadOptions; comprehensive round-trip tests cover every new code path.
  • Toolchain bump: nightly-2025-12-27nightly-2026-06-20 to support 0.6's if-let guards and fix a rustc query-depth ICE with matrix-sdk.

Confidence Score: 5/5

This PR is safe to merge — the store migration shim is correct, the LID-native addressing path has been live-validated, and all previous review feedback was incorporated.

The core migration path (LegacyDevice05 shim, JSON-persisted DeviceListRecord, dual decode with warn logging) is implemented correctly and backed by targeted unit tests. The outbound LID→PN hack removal is clean.

connection.rs warrants a follow-up once the Bot 0.6 destructor flush guarantee is confirmed — no other files require special attention.

Important Files Changed

Filename Overview
crates/whatsapp/src/sled_store.rs Major rewrite implementing 0.6 store traits; LegacyDevice05 migration shim with in-place upgrade; DeviceListRecord stored as JSON; warn/debug logging on both decode paths. Well-tested and previous review comments are fully addressed.
crates/whatsapp/src/connection.rs Bot lifecycle migrated to the 0.6 select-on-cancel pattern; OnceCell state-ref wiring for the event handler. The pre-existing concern about whether dropping the bot future guarantees a sled flush before mark_done() remains unaddressed in code.
crates/whatsapp/src/outbound.rs LID→PN rewrite hack removed; send_message now uses SendResult.message_id for self-chat loop detection; upload uses UploadOptions via Default::default(). Tests confirm JID resolution handles LID pass-through correctly.
crates/whatsapp/src/handlers.rs Updated for Arc, CompactString Jid.user, and LID-aware owner detection using both PN and LID JIDs. Self-chat detection logic is sound and well-covered by tests.
crates/whatsapp/src/memory_store.rs Reworked for 0.6 traits with typed identities, Bytes sessions, per-device sender-key tracking, and all new store methods. Comprehensive tests added.
Cargo.toml Adds [patch.crates-io] block pinning the entire whatsapp-rust family to git rev 96686ea; bumps version constraints from 0.5 to 0.6; adds serde-big-array workspace dep. Patch comment explains the removal strategy.
rust-toolchain.toml Toolchain bumped from nightly-2025-12-27 to nightly-2026-06-20 to support if-let guards used by 0.6 and to resolve a rustc ICE.
crates/whatsapp/src/state.rs AccountState.send_message updated to use SendResult; watermark logic unchanged. ShutdownState and sent-ID tracking unaffected.

Reviews (2): Last reviewed commit: "fix(whatsapp): log device-store decode f..." | Re-trigger Greptile

Comment thread crates/whatsapp/src/sled_store.rs
Comment thread crates/whatsapp/src/sled_store.rs
Review feedback on moltis-org#1144: the sled store discarded the primary decode
error before trying the 0.5 legacy shim — when both decodes fail, the
surfaced error hid the original cause — and undecodable device-list
records were evicted with no signal, so operators could not tell "never
existed" from "evicted on decode failure".

The legacy device fallback now logs the in-place migration at info and a
double decode failure at warn with both errors; device-list eviction logs
at debug (expected across the layout change, usync repopulates it).
@juanlotito juanlotito marked this pull request as ready for review July 2, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant