Skip to content

fix(telemetry): drop undocumented ext.os.release / ext.app.initTs#642

Merged
timenick merged 1 commit into
mainfrom
zhiwang/fix-telemetry-invalid-event-format
May 15, 2026
Merged

fix(telemetry): drop undocumented ext.os.release / ext.app.initTs#642
timenick merged 1 commit into
mainfrom
zhiwang/fix-telemetry-invalid-event-format

Conversation

@timenick

Copy link
Copy Markdown
Collaborator

Closes #635.

Diagnosis

The OneCollector backend rejected every batch with
{"acc":0,"efi":{"InvalidEventFormat":"all"}}. A live probe against
the production ingest narrowed the cause to two fields in the
envelope that are not part of the documented CS 4.0 extension
slots — either field on its own is enough to flip the whole batch
from acc:1 to InvalidEventFormat: all:

envelope field CS 4.0 status
ext.os.release not documented (ext.os keys: name, ver, bootId)
ext.app.initTs not documented (ext.app keys: name, ver, id, iid, expId, userId, sesId, env, asId, locale, tz)

ext.app.initTs is rejected for every value type (float, int, ISO
8601 string), so this is the field name being unknown to the
schema, not a type mismatch.

The previous acc:1 validation in #456 happened to use a minimal
envelope (no ext), so the bad fields never reached the wire. The
existing unit tests asserted the envelope shape but never validated
it against the live backend, so the regression slipped through.

Fix

  • _resource_to_ext() no longer maps os.release → ext.os.release
    or initTs → ext.app.initTs.
  • _build_resource() no longer puts those attributes in the
    Resource at all; the _init_ts instance attribute is dropped too
    (it was only read by the now-removed mapping).
  • The "translates resource to ext" unit test is updated to assert
    the trimmed envelope, plus a new test_export_omits_undocumented_cs40_ext_fields
    fails fast if anyone re-introduces either mapping.

Probe log (production ingest, masked)

[01 minimal Heartbeat (control)]                              status=200  body={"acc":1}
[02 Heartbeat + full ext (initTs FLOAT)]                      status=400  body={"acc":0,"efi":{"InvalidEventFormat":"all"}}
[03 Heartbeat + full ext (initTs STRING)]                     status=400  body={"acc":0,"efi":{"InvalidEventFormat":"all"}}
[07 Heartbeat + only ext.app.initTs as FLOAT]                 status=400  body={"acc":0,"efi":{"InvalidEventFormat":"all"}}
[14 ext.device only {localId, authId, deviceClass}]           status=200  body={"acc":1}
[15 ext.os only {name, ver, release}]                         status=400  body={"acc":0,"efi":{"InvalidEventFormat":"all"}}
[16 full ext WITHOUT initTs but WITH release]                 status=400  body={"acc":0,"efi":{"InvalidEventFormat":"all"}}

OneCollector rejected every batch with
`{"acc":0,"efi":{"InvalidEventFormat":"all"}}` because the envelope
carried two fields that aren't part of the documented CS 4.0
extension slots:

  - ext.os.release   (the documented ext.os keys are name, ver, bootId)
  - ext.app.initTs   (the documented ext.app keys are name, ver, id,
                      iid, expId, userId, sesId, env, asId, locale, tz)

A live probe against the production ingest confirmed the diagnosis:
removing either field flips a 400 back to acc:1; either field on its
own is enough to fail the whole batch.

`_resource_to_ext()` no longer maps these attributes, and
`_build_resource()` no longer puts them in the Resource. The
`_init_ts` instance attribute is dropped too (unused after the
mapping removal). Unit tests are updated to assert the trimmed
envelope, plus a regression test that fails fast if anyone
re-introduces either mapping.

Closes #635
@timenick timenick requested a review from a team as a code owner May 15, 2026 07:33
@timenick timenick enabled auto-merge (squash) May 15, 2026 07:41
@timenick timenick merged commit ba484c5 into main May 15, 2026
9 checks passed
@timenick timenick deleted the zhiwang/fix-telemetry-invalid-event-format branch May 15, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

telemetry backend returned 400

2 participants