[observer] Optimize#50395
Draft
Eokye wants to merge 4 commits intoq-branch-observerfrom
Draft
Conversation
Reduce per-log heap allocations by storing log content as string instead of []byte throughout the observer pipeline. LogView.GetContent() now returns string. The single []byte→string conversion happens once at the pipeline boundary (messageLogView adapter in logssource/impl/pipeline.go). Downstream extractors share the immutable string with zero copies, eliminating 2-3 body-sized allocations per log that previously occurred in LogMetricsExtractor, LogPatternExtractor, and ConnectionErrorExtractor. Also gates engine.accumulatedTelemetry behind an enableAccumulatedTelemetry flag (true only in testbench mode). In live mode this slice grew without bound and nothing read it.
Contributor
Gitlab CI Configuration Changes
|
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 361 | 0 | 0 |
Updated: .gitlab/distribution.yml
Changes Summary
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 0 | 2 | 0 |
ℹ️ Diff available in the job log.
Contributor
Go Package Import DifferencesBaseline: e5b320d
|
Contributor
Static quality checks❌ Please find below the results from static quality gates Error
Gate failure full details
Static quality gates prevent the PR to merge! Successful checksInfo
On-wire sizes (compressed)
|
Replace the newTelemetryGauge → ObserverTelemetry → handleTelemetry path with a direct callback for processing-time telemetry on the hot path. Live observer captures the telemetry.Gauge reference in a closure and sets engine.onProcessingTime; the engine calls it directly instead of constructing ObserverTelemetry objects, returning them across the function boundary, and unwrapping them in handleTelemetry. Testbench leaves the callback nil, so the existing ObserverTelemetry accumulation path is completely unchanged. Also caches "detector:<name>" tag strings per component (rebuilt on SetDetectors/SetExtractors/SetCorrelators) to avoid per-log concat. Eliminates per-log: 3× metricObs heap allocs, 3× copyTags slice allocs, 3× detectorNameFromTags scans, logTelemetry slice growth, and handleTelemetry iteration + map lookups for timing entries.
Replace stdlib fnv.New64a() + h.Write([]byte(...)) with inline
fnv64aString/fnv64aMix/fnv64aMixUint64 that operate on strings
directly — zero alloc on the hot path.
Replace fmt.Sprintf("log.pattern.%x.count", ...) with
strconv.FormatUint + string concat.
Also: fail-fast panic guard on processingTimeGauge lookup from Batch 2
review, and stdlib-equivalence tests for the inline fnv functions.
Remove the redundant s.seriesIDs map[string]SeriesRef — the numeric ref now lives directly on seriesStats. All 5 call sites already had the *seriesStats in hand; they now read stats.ref instead of doing a second map lookup. Also remove the dead seriesStats.internalKey field (set but never read). Saves one full map[string] worth of overhead (~2 MiB at 50k series) and simplifies the code.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batched optimizations to reduce observer CPU/memory overhead, targeting SMP bounds compliance.
SMP results: 881 MiB → 631 MiB (-28%) on observer_logs_anomaly_stress. Bound: 370 MiB.
Batch 1: String content + gate accumulated telemetry
Results compared to last optimization
Batch 2: Direct gauge.Set for per-component processing time
telemetry.Gaugereference in a closure and setsengine.onProcessingTime; the engine calls it directly instead of constructingObserverTelemetryobjects, returning them across the function boundary, and unwrapping them inhandleTelemetry.ObserverTelemetryaccumulation fires as before."detector:<name>"tag strings per component, rebuilt onSetDetectors/SetExtractors/SetCorrelators.metricObsheap allocs, 3×copyTagsslice allocs, 3×detectorNameFromTagsscans,logTelemetryslice growth, andhandleTelemetryiteration + map lookups for timing entries.Batch 3: Inline fnv64a hashers + kill fmt.Sprintf
fnv64aString/fnv64aMix/fnv64aMixUint64replacefnv.New64a()+h.Write([]byte(...)). Same FNV-1a algorithm, operates on strings directly — zero alloc. Stdlib-equivalence tests included.strconv.FormatUint+ string concat replacesfmt.Sprintf("log.pattern.%x.count", ...). Avoids the reflect-based formatter.patternCountMetricName(per log),tagGroupByKeyHash(per log),globalClusterHash(per cluster create).processingTimeGaugemap lookup (Batch 2 review finding).Batch 4: Merge seriesIDs map into seriesStats.ref
s.seriesIDs map[string]SeriesRef— the numeric ref now lives directly onseriesStats.ref. All 5 call sites already had*seriesStatsin hand; they now readstats.refinstead of a second map lookup.seriesStats.internalKeyfield (set but never read).map[string]worth of overhead (~2 MiB at 50k series) and simplifies storage code.