Skip to content

Conversation

GregoryComer
Copy link
Member

@GregoryComer GregoryComer commented Oct 11, 2025

Summary

We're seeing crashes on Android when running XNNPACK-delegated models. I tracked it down to a bug in the alignment calculation for weight cache memory. To make the calculation, it casts the void* to a (signed) intptr_t. When the address is in the upper half of the address space, it becomes negative. This causes the modulo to return a negative value and increment the address too much - leading to out of bounds access.

void* maybe_aligned_space = data_container.data();
void* aligned_space = (void*)((intptr_t)maybe_aligned_space + 64 -
(intptr_t)maybe_aligned_space % 64);

Walking through the numbers I captured in #14831:

  • The raw (unaligned) address of the data buffer is 0xb40000763d4bfa90.
  • The target alignment is 64 bytes.
  • Casting the address to intptr_t gives -5476376639047992688.
    • Mod 64 is -48.
    • The total offset applied is 64 - (-48) = 112.
  • Since the allocation size is N + 64, increasing the start by 112 means the new region extends 48 bytes past the end of the allocation.

To resolve this, I replaced the alignment code with a call to std::align. Casing to uintptr_t also resolves it, but using the standard implementation seems less error prone.

Test plan

I've validated that the repro in #14831 does not crash with this change.

Copy link

pytorch-bot bot commented Oct 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15039

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

❌ 6 New Failures, 1 Unrelated Failure

As of commit 15097e3 with merge base 019c8da (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 11, 2025
@GregoryComer GregoryComer force-pushed the fix-weight-cache-align branch from d3e49c7 to 15097e3 Compare October 11, 2025 22:38
@GregoryComer
Copy link
Member Author

CI failures are due to running on a fork or broken trunk.

@GregoryComer GregoryComer marked this pull request as ready for review October 11, 2025 23:49
@GregoryComer GregoryComer added the release notes: none Do not include this in the release notes label Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant