Skip to content

Conversation

JesusRojass
Copy link
Contributor

WIP!!! - Fix app start trace outliers from network delays (#10733)

Discussion

Fix _app_start outliers mentioned in #10733 (Still Draft, Work In Progress Ongoing testing, seems good)

User Statements:

  • Long time reported in the _app_start metric in the 90 to 95 percentile of data in firebase console (up to 1000+ Seconds)

  • Issues seems to be appearing from background tasks that kick in the activity and ends until the first run loop runs successfully

  • Some reports have mentioned been able to have long _app_star metric when app launch is interrupted (Via locking the device or receiving a phone call)

What this fixes? (My possible reproduction ideas on why this is happening):

Case 1 - Spotty network right at cold launch

  • Force quit the app to ensure a cold start.
  • Enable a throttled network profile (e.g. Network Link Conditioner on macOS/iOS): high latency (≥ 800–1500 ms), low bandwidth, 1–5% packet loss.
  • Launch the app
  • Do not interact for some time, just letting the app do it’s thing.
  • Record Perf logs and note whether _app_start duration is far above the norm.

Case 2 - Targeted failures for early endpoints (e.g. Like if your App depended on many endpoints to launch and one of them was down)

  • Force quit the app to ensure a cold start.
  • Run the app through a proxy (I personally used Charles) and set DNS fail or HTTP 5xx / timeout for one startup endpoint at a time.
  • Launch the app and let it’s thing.
  • Observe whether _app_start stays open until the failed/slow call resolves or times out.
  • Switch which endpoint you will target to show the issue generalizes beyond a single service.

Case 3 - Background launch before foreground

  • Force quit the app to ensure a cold start.
  • Trigger a background launch (silent push via simulator tools or device test service).
  • Wait for a bit then open the app.
  • Check if the duration of _app_start approximates the time spent before you foregrounded.

Case 4 - Sature GDC Workers to limit the available thread pool

  • In my case I tested by just creating tasks that get hold and then released after x amount of time.
  • When launching the app UI could appear but anything that needs .userInitiated workers such as parsing, network callbacks, image decoding will be very delayed.
  • The _app_start should inflate.

Testing

  • Ongoing Testing, Work in Progress but locally it is passing all unit tests
Screenshot 2025-10-10 at 2 58 42 p m

API Changes

  • No API Changes

@JesusRojass
Copy link
Contributor Author

@visumickey @eBlender Draft pull request While I test

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix outliers in app start traces caused by network delays or other interruptions by introducing a 'reasonable' timeout (30 seconds) after which the app start trace is cancelled. The changes include replacing a static flag with an instance property for better state management and modifying the trace completion logic.

My review focuses on improving code clarity and simplifying the asynchronous execution. I've suggested removing a piece of uncertain-looking comment and simplifying a nested dispatch_async call that adds unnecessary complexity. The core logic for cancelling long-running app start traces appears sound and should address the reported issue.

Comment on lines +278 to 314
// Use dispatch_async with a higher priority queue to reduce interference from network
// operations This ensures trace completion isn't delayed by main queue congestion from network
// calls
__weak typeof(self) weakSelf = self;
dispatch_async(dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0), ^{
dispatch_async(dispatch_get_main_queue(), ^{
__strong typeof(weakSelf) strongSelf = weakSelf;
if (!strongSelf || !strongSelf.appStartTrace) {
return;
}

NSTimeInterval startTimeSinceEpoch = [strongSelf.appStartTrace startTimeSinceEpoch];
NSTimeInterval currentTimeSinceEpoch = [[NSDate date] timeIntervalSince1970];
NSTimeInterval elapsed = currentTimeSinceEpoch - startTimeSinceEpoch;

// The below check accounts for multiple scenarios:
// 1. App started in background and comes to foreground later
// 2. App launched but immediately backgrounded
// 3. Network delays during startup inflating metrics
BOOL shouldCompleteTrace = (elapsed < gAppStartMaxValidDuration) &&
[strongSelf isAppStartEnabled] &&
![strongSelf isApplicationPreWarmed];

// Additional safety: cancel if elapsed time is unreasonably long for app start
if (shouldCompleteTrace && elapsed < gAppStartReasonableValidDuration) {
[strongSelf.appStartTrace stop];
} else {
[strongSelf.appStartTrace cancel];
if (elapsed >= gAppStartReasonableValidDuration) {
// Log for debugging network related delays
NSLog(
@"Firebase Performance: App start trace cancelled due to excessive duration: %.2fs",
elapsed);
}
}
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nested dispatch_async calls seem overly complex for deferring execution. The outer dispatch to a global user-initiated queue, only to immediately dispatch back to the main queue, doesn't prevent delays if the main queue is congested. The block will still have to wait for its turn on the main thread.

A single dispatch_async to the main queue is sufficient to defer the execution to the next run loop cycle. This simplifies the code and removes the misleading comment about avoiding main queue congestion.

The use of weakSelf/strongSelf is a good improvement for safety over the original implementation.

    // Defer stopping the trace to the next run loop cycle. This is to ensure that the app is
    // fully interactive.
    __weak typeof(self) weakSelf = self;
    dispatch_async(dispatch_get_main_queue(), ^{
      __strong typeof(weakSelf) strongSelf = weakSelf;
      if (!strongSelf || !strongSelf.appStartTrace) {
        return;
      }

      NSTimeInterval startTimeSinceEpoch = [strongSelf.appStartTrace startTimeSinceEpoch];
      NSTimeInterval currentTimeSinceEpoch = [[NSDate date] timeIntervalSince1970];
      NSTimeInterval elapsed = currentTimeSinceEpoch - startTimeSinceEpoch;

      // The below check accounts for multiple scenarios:
      // 1. App started in background and comes to foreground later
      // 2. App launched but immediately backgrounded
      // 3. Network delays during startup inflating metrics
      BOOL shouldCompleteTrace = (elapsed < gAppStartMaxValidDuration) &&
                                 [strongSelf isAppStartEnabled] &&
                                 ![strongSelf isApplicationPreWarmed];

      // Additional safety: cancel if elapsed time is unreasonably long for app start
      if (shouldCompleteTrace && elapsed < gAppStartReasonableValidDuration) {
        [strongSelf.appStartTrace stop];
      } else {
        [strongSelf.appStartTrace cancel];
        if (elapsed >= gAppStartReasonableValidDuration) {
          // Log for debugging network related delays
          NSLog(
              @"Firebase Performance: App start trace cancelled due to excessive duration: %.2fs",
              elapsed);
        }
      }
    });

static NSDate *doubleDispatchTime = nil;
static NSDate *applicationDidFinishLaunchTime = nil;
static NSTimeInterval gAppStartMaxValidDuration = 60 * 60; // 60 minutes.
static NSTimeInterval gAppStartReasonableValidDuration = 30.0; // 30 seconds, reasonable app start time???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The ??? in the comment suggests uncertainty. Please remove it for a more polished and professional code appearance.

Suggested change
static NSTimeInterval gAppStartReasonableValidDuration = 30.0; // 30 seconds, reasonable app start time???
static NSTimeInterval gAppStartReasonableValidDuration = 30.0; // 30 seconds, reasonable app start time.

Copy link
Contributor Author

@JesusRojass JesusRojass Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Fix, I may need input from @visumickey to see if this is ideal or an acceptable time (Hence the ???)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant