Skip to content

fix(alerts): resolve fingerprint dedup causing alert history loss#25

Open
KKamJi98 wants to merge 2 commits intomainfrom
fix/alert-fingerprint-dedup
Open

fix(alerts): resolve fingerprint dedup causing alert history loss#25
KKamJi98 wants to merge 2 commits intomainfrom
fix/alert-fingerprint-dedup

Conversation

@KKamJi98
Copy link
Contributor

@KKamJi98 KKamJi98 commented Mar 2, 2026

Summary

  • 문제: Alertmanager가 resolved 후 동일 labels로 재 firing 시 같은 fingerprint를 생성하여, ON CONFLICT (alert_id) UPSERT가 기존 레코드를 덮어씀 → alert 히스토리 유실, 분석 결과 덮어쓰기
  • 수정: alert_idALR-{uuid} 형식으로 분리하고, fingerprint는 그룹핑 키로 유지. 동일 fingerprint + firing 중인 alert만 UPDATE, 그 외는 새 레코드 INSERT
  • 추가 수정: SaveAlert 원자적 COALESCE 쿼리로 race condition 방지, resolved_at WHERE 불일치 수정, ON CONFLICT 시 labels/severity 등 갱신 누락 수정

Test plan

  • go test ./... 전체 PASS (15개 신규 테스트 포함)
  • 신규 firing → ALR-xxx 생성 확인
  • 동일 fingerprint 반복 firing → 기존 ALR-xxx UPDATE
  • resolved 후 재 firing → 새 ALR-yyy 생성 확인
  • Flapping 감지 정상 동작

Alertmanager reuses fingerprints for re-fired alerts, causing resolved-then-re-fired
alerts to overwrite existing records. Separate alert_id (ALR-UUID) from fingerprint
(grouping key) so each occurrence gets its own record.

- alert_id now uses ALR-{uuid[:8]} format, fingerprint kept as grouping key
- Atomic COALESCE subquery in SaveAlert to prevent TOCTOU race conditions
- Partial unique index ensures one firing alert per fingerprint
- Fix resolved_at never being set (WHERE clause mismatch)
- Extract alertStore/alertSlacker/alertAnalyzer interfaces for testability
- Add 15 unit tests covering dedup scenarios
@KKamJi98 KKamJi98 force-pushed the fix/alert-fingerprint-dedup branch from 8e54219 to 5cf2e21 Compare March 2, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant