Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-8042 Replace Upload Quality with Batch Blocked telemetry #2230

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

maxep
Copy link
Member

@maxep maxep commented Mar 12, 2025

What and why?

Specifications of Upload Quality attributes in Session Ended metric telemetry are not suitable for graphing.
Instead, I propose adding a new Batch metric event to report batches blocked during upload. Currently, we have the Batch Deleted telemetry metric, from which we can compute the count of batches deleted due to intake-code-202 (successful upload). We can add a Batch Blocked metric to report when a batch is blocked due to local blockers (low power, offline, etc.) or network failures. This will provide visibility into why batches are not uploaded and allow us to compute the ratio of successful uploads versus failures. This new telemetry event should be sampled at the same rate as Batch Deleted: 1.5%.

The complex rse.upload_quality attribute can then be replaced by a simple rse.upload_cycle.<logs|rum|*> that track upload cycle per track. In this new definition, an Upload Cycle will include multiple batch upload.

How?

Similar to BatchDeletedMetric, the BatchBlockedMetric will report reasons for blocking batch upload, or failures to upload (network or http)

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes
  • Add Objective-C interface for public APIs (see our guidelines [internal]) and run make api-surface)

@maxep maxep force-pushed the maxep/RUM-8042/batch-blocked-telemetry branch from 229b5b9 to 09a608a Compare March 13, 2025 09:11
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Mar 13, 2025

Datadog Report

Branch report: maxep/RUM-8042/batch-blocked-telemetry
Commit report: 564152b
Test service: dd-sdk-ios

✅ 0 Failed, 1033 Passed, 2853 Skipped, 1m 16.08s Total duration (1m 31.33s time saved)

@maxep maxep force-pushed the maxep/RUM-8042/batch-blocked-telemetry branch from 09a608a to f51cb4c Compare March 13, 2025 09:36
@maxep maxep marked this pull request as ready for review March 13, 2025 09:37
@maxep maxep requested review from a team as code owners March 13, 2025 09:37
mariedm
mariedm previously approved these changes Mar 13, 2025
Copy link
Member

@mariedm mariedm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT!

simaoseica-dd
simaoseica-dd previously approved these changes Mar 13, 2025
Copy link
Contributor

@simaoseica-dd simaoseica-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌

Copy link
Contributor

@barboraplasovska barboraplasovska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏅

/// Each track reports its own upload quality metrics.
let uploadQuality: [String: UploadQuality]
/// Information about the upload cycles during the session.
/// The upload cycles is splitting between upload track name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\nit /// The upload cycles are splitting between upload track name.

@maxep maxep dismissed stale reviews from barboraplasovska, simaoseica-dd, and mariedm via 5204914 March 14, 2025 16:59
@maxep maxep force-pushed the maxep/RUM-8042/batch-blocked-telemetry branch 3 times, most recently from aa1b8d9 to 31ae520 Compare March 17, 2025 11:22
static let sampleRate: Float = 1.5 // 1.5%
/// The key for uploader's current delay.
static let uploaderDelayKey = "uploader_delay.current"
/// The key for count of bacthes being blocked.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The key for the number of batches being blocked.

Copy link
Member

@mariedm mariedm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small questions, otherwise LGMT!

DD.logger.debug("⏳ (\(self.featureName)) Uploading batches...")
self.backgroundTaskCoordinator?.beginBackgroundTask()
self.uploadFile(from: files.reversed(), context: context)
sendUploadCycleMetric()
} else {
let batchLabel = files?.isEmpty == false ? "YES" : (isSystemReady ? "NO" : "NOT CHECKED")
let batchLabel = files.isEmpty ? "NO" : "YES"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine the answer is no, but does not sending "NOT CHECKED" anymore have any consequences on our metrics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't, it was just for console printing. The drawback of this change is that we will check for batches even if the upload is blocked, but the impact is minimal as we only list the files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, we use to send sendUploadQualityMetric both in cases of success and failure. Now we send sendUploadCycleMetric before every upload, and track failures with sendBatchBlockedMetric. So we don't explicitly send a telemetry in case of success. I imagine we can still graph the success rate, but is this intentional?

Comment on lines +76 to +78
let files = fileReader.readFiles(limit: maxBatchesPerUpload)

if !files.isEmpty && isSystemReady {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to asses if this change might have a negative performance impact ?
I don't think it would because contentsOfDirectory (urderlying Filesystem method used) only get URLs, and not the content of each file, but maybe it's worth checking using maxBatchesPerUpload files everytime ?
Since we only use files to get their count for the telemetry, it would be bad if this also had a performance impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files with their InputStream will be needed for the self.uploadFile(from: files.reversed(), context: context)

@maxep maxep force-pushed the maxep/RUM-8042/batch-blocked-telemetry branch from 31ae520 to 564152b Compare March 24, 2025 14:43
@maxep maxep marked this pull request as draft March 26, 2025 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants