Skip to content

Conversation

@supervacuus
Copy link
Collaborator

@supervacuus supervacuus commented Nov 25, 2025

📜 Description

As discussed last week with @markushi, the process will be to make minimal atomic changes in each PR and merge directly to main rather than accumulating on an uber feature branch. This allows for easier review/feedback/corrections, and we can already test a subset of the entire feature "in the field".

The first minimal change is a basic tombstone integration:

  • exposes internal option to enable/disable (disabled by default)
  • the integration will only ever be enabled if the runtime system is at least Android 12
  • the current setup suggests an operation where users will have to disable NDK support (or only depend on sentry-android-core) or get two reports for the same crash
  • only retrieves the most current tombstone (the current implementation is not entirely correct: we should either remove any remaining ApplicationExitInfo entries with REASON_CRASH_NATIVE or report them too, including the option for the latter; I left this out for minimal interface exposure in the first step, but either variant is easy to add to this PR or later)
  • adds a basic tombstone parser/decoder and accompanying snapshot test
  • introduces the following dependencies:
    • runtime: protobuf-javalite (the entire features adds ca. 75KiB to the Android sample release APK)
    • build: the protobuf plugin and the protoc compiler to automate protocol updates

Open Issues:

  • While the protobuf runtime is relatively small, there is still the possibility of conflicting with client-side protobuf versions (major versions often introduce quite severe breakage, but I haven't tested this yet, only reviewed change logs).
  • finding clarity in how to proceed with older tombstones (ignore or handling similarly to ANRv2)
  • decision if this minimal setup already makes sense to release as an internal API
  • add ManifestMetadataReader to configure conveniently? (or not since the corresponding options are only internal?)
  • No changes yet to the UI (there are quite a few aspects that would make these stack-traces more readable, from my pov this is currently out of scope, but i am writing stuff down)

💡 Motivation and Context

First sensible release step for #3295

Part of https://linear.app/getsentry/project/tombstone-support-android-0024cb6e3ffd/

💚 How did you test it?

  • Added a basic parser snapshot test for a serialized tombstone protobuf.
  • Manual testing.

📝 Checklist

  • I added GH Issue ID & Linear ID
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

  • Decide if we want to go forward with client-side tombstone processing. This decision should happen with this PR.
    • If no, consider sending a tombstone as an attachment, similar to how we attach minidumps and let ingestion/processing deal with decoding (I can prototype this in parallel, since I think we can do most of it in relay, as a first step)
    • If yes, decide on adding the protobuf runtime library as a dependency to sentry-android-core (or shade/relocate, or implement our own decoder, given that this is a stable format which only requires a subset of protobuf).
      • Depending on feedback, adaptation to this minimal change (primarily handling older tombstones).
      • Otherwise, integrate an EventProcessor that merges crash events from sentry-native with tombstones.

#skip-changelog (for now)

@supervacuus
Copy link
Collaborator Author

supervacuus commented Nov 26, 2025

@markushi, I just realized I cannot omit having a separate "tombstone" marker, even if I report all events, without repeating them. I mean, this was clear to me, but, in addition to it being a must for this PR already, unlike the ANR marker, I must also align it with crashedLastRun.

I wonder if it would make the most sense to do it similarly to ANR by using a TombstoneHint to ensure writing the marker timestamp at the correct life-cycle stage of the event, but introduce a new interface analog to AbnormalExit called CrashExit, so that the hint can take a different route (since it isn't truly an abnormal exit and should affect similar paths to the Native SDK crash marker).

The biggest issue with that approach is that the crashedLastRun is handled entirely in EnvelopeCache rather than AndroidEnvelopeCache, whereas ApplicationExitInfo hint/marker handling happens in the latter.

The PR still makes sense for a first review from you (since, if the general direction makes sense and you have todos not in my list, I can also add a test for the integration itself). Still, I would appreciate a short sync on how to align these execution paths (maybe I don't need to align tombstones with crashedLastRun and can abuse AbnormalExit for the same outcome, though it feels like a significant shortcut, even for minimal internal exposure, which is not at all what I aimed for).

I can convert the PR back to a draft if you prefer.

Copy link
Member

@markushi markushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments - great work so far!

… S, since that is the earliest version where `REASON_CRASH_NATIVE` provides tombstones via the `traceInputStream`)
…rker

This currently does not work:

While we now can optionally enable reporting of "historical" tombstones, by making the `TombstoneHint` `Backfillable` it will automatically be enriched by the `ANRv2EventProcessor` which is currently the only `BackfillingEventProcessor`.

The `ANRv2EventProcessor` is partially written in way that is potentially generic for events with `Backfillable` hints, but other parts are enriching as if those are always were ANRs, which up to now was true, but with Tombstones that assumption now breaks.

Next Steps:

* There is considerable duplication between the ANRv2Integration and TombstoneIntegration
…tProcessor

this handles all events with Backfillable hint, but adds an interface HintEnricher, to allow hint-specific enrichment (like for ANRs) before and after the generic backfilling happened.
@NotNull SentryEvent event, @NotNull Backfillable hint, @NotNull Object rawHint);
}

private final class AnrHintEnricher implements HintEnricher {
Copy link
Member

@markushi markushi Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a pretty good solution to me, nice!

Comment on lines +772 to 782
if (mainThread == null) {
// if there's no main thread in the event threads, we just create a dummy thread so the
// exception is properly created as well, but without stacktrace
mainThread = new SentryThread();
mainThread.setStacktrace(new SentryStackTrace());
}
event.setExceptions(
sentryExceptionFactory.getSentryExceptionsFromThread(mainThread, mechanism, anr));
}
}
// endregion
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: app.inForeground is missing for TombstoneHint events due to AnrHintEnricher not supporting NativeCrashExit hints.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

The app.inForeground field is no longer set for TombstoneHint (native crash) events. The new code moved inForeground logic to AnrHintEnricher.setAppForeground(), which only supports AbnormalExit (ANR) hints. As a result, when ApplicationExitInfoEventProcessor.process() handles a TombstoneHint, no enricher is found, and setAppForeground() is never invoked. This is a functional regression, causing missing context data for native crash events that was previously available.

💡 Suggested Fix

Extend AnrHintEnricher or create a dedicated enricher (e.g., TombstoneHintEnricher) to support NativeCrashExit hints and set the app.inForeground field for TombstoneHint events.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location:
sentry-android-core/src/main/java/io/sentry/android/core/ApplicationExitInfoEventProcessor.java#L665-L782

Potential issue: The `app.inForeground` field is no longer set for `TombstoneHint`
(native crash) events. The new code moved `inForeground` logic to
`AnrHintEnricher.setAppForeground()`, which only supports `AbnormalExit` (ANR) hints. As
a result, when `ApplicationExitInfoEventProcessor.process()` handles a `TombstoneHint`,
no enricher is found, and `setAppForeground()` is never invoked. This is a functional
regression, causing missing context data for native crash events that was previously
available.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 6496195

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is 100% intentional from my side, at least for now.


final SentryStackTrace stacktrace = createStackTrace(threadEntryValue);
thread.setStacktrace(stacktrace);
if (tombstone.getTid() == threadEntryValue.getId()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Inconsistent thread ID source may misidentify crashed thread

The SentryThread.id is set from threadEntry.getKey() (the map key), but the crashed thread identification on line 77 compares tombstone.getTid() with threadEntryValue.getId() (the Thread's internal id field). If these values ever differ, the crashed thread would not be correctly identified. For consistency and correctness, the comparison at line 77 should use threadEntry.getKey() instead of threadEntryValue.getId(), since that's the source used to set the thread ID.

Fix in Cursor Fix in Web

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also intentional, because the key and the ID should be equivalent.

@supervacuus
Copy link
Collaborator Author

supervacuus commented Dec 9, 2025

@markushi and @romtsn, I think I also found an okay solution to reduce duplication between the tests for the two AEI integrations. That means the first three "milestones" required for a release (without Native SDK event enrichment/merging) are finished. Please let me know if something is missing from your POV or not yet up to par, and how you would like to proceed.

Btw, multiple TODO entries in the PR don't necessarily highlight a vital change still open, but do signal a decision we probably should have made in this PR. Please understand them as review guidance for open questions/decisions I still have.

Comment on lines +143 to +149
// TODO: if we align this with ANRv2 this would be overwritten in a BackfillingEventProcessor as
// `ApplicationExitInfo` not sure what the right call is. `ApplicationExitInfo` is
// certainly correct. But `signalhandler` isn't wrong either, since all native crashes
// retrieved via `REASON_CRASH_NATIVE` will be signals. I am not sure what the side-effect
// in ingestion/processing will be if we change the mechanism, but initially i wanted to
// stay close to the Native SDK.
mechanism.setType("signalhandler");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO asks whether to use the mechanism type signalhandler (as we do in the Native SDK) or to change it. If we decide on the latter, should we do it in this PR now? What is the trade-off (what do we gain from ApplicationExitInfo as a mechanism type vs what do we lose when dropping signalhandler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think signalhandler is the best choice here, as it also makes a potential upgrade path easier (e.g. existing sentry.io dashboard queries will continue to work when switching from sentry-native to ApplicationExitInfo)

Comment on lines +685 to +688
// TODO: not sure about this. all tests are written with AbnormalExit, but enrichment changes
// are ANR-specific. I called it AnrHintEnricher because that is what it does, but it
// actually triggers on AbnormalExit. Let me know what makes most sense.
return hint instanceof AbnormalExit;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily about naming things: the class is called AnrHintEnricher (i.e., specific for ANR), but we discriminate based on AbnormalExit (which is arguably broader).

But it is also about "correctness": right now, the tests related to the event processor all use AbnormalExit when constructing test events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it as-is for compatibility reasons, maybe some javadoc on AbnormalExit could help to make it more clear.

Comment on lines +133 to +135
// TODO: this should probably check whether the Native SDK integration is currently enabled or
// remove the marker file if it isn't. Otherwise, application that disable the Native SDK,
// will report a crash for the last run forever.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR, but it can serve as a trigger to fix it: if the native SDK was once enabled and then disabled, do we have a cleanup process elsewhere that would remove the old native marker? If the marker's lifecycle is managed only by the Native SDK, we would report crashedLastRun == true forever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do some cleanup within sentry core here, regardless if the native SDK is initialized or not.

if (HintUtils.hasType(hint, AbnormalExit.class)) {
tryEndPreviousSession(hint);
}
// TODO: adapt tryEndPreviousSession(hint); to CrashExit.class
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely relevant for this PR

}
}

// TODO: maybe set crashLastRun for tombstone too?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely relevant for this PR

Comment on lines +71 to +72
* adjusted before they are symbolicated. TODO: should we make this an enum or is a string value
* fine?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is part of the public API, it is also a sensible decision to make now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants