File tree Expand file tree Collapse file tree 3 files changed +28
-0
lines changed Expand file tree Collapse file tree 3 files changed +28
-0
lines changed Original file line number Diff line number Diff line change 22
33## master  
44*  [ ENHANCEMENT]  Add bigger tenants and configure default compactor tenant shards
5+ *  [ ENHANCEMENT]  Add alert ` CortexCompactorWriteVisitMarkerIsFailing `  to monitor compactors
56
67## 1.17.1 / 2024-10-23  
78*  [ CHANGE]  Use cortex v1.17.1
Original file line number Diff line number Diff line change 102102|||  % $._config,
103103          },
104104        },
105+         {
106+           // Alert if compactor are not able to update the visit-marker. 
107+           alert:  'CortexCompactorBlockVisitMarkerIsFailing' ,
108+           'for' : '2h' ,
109+           expr: ||| 
110+             sum(increase(cortex_compactor_block_visit_marker_write_failed{job=~".+/%(compactor)s"}[2h]))>0 
111+ |||  % $._config.job_names,
112+           labels:  {
113+             severity:  'critical' 
114+           },
115+           annotations:  {
116+             message: ||| 
117+               Cortex compactors are not able to update the visit marker, double check logs to see what is happening 
118+ ||| 
119+           }
120+         }
105121      ],
106122    },
107123  ],
Original file line number Diff line number Diff line change @@ -379,6 +379,17 @@ How to **investigate**:
379379- Ensure ingesters are successfully shipping blocks to the storage 
380380- Look for any error in the compactor logs 
381381
382+ ### CortexCompactorWriteVisitMarkerIsFailing 
383+ 
384+ Only applies to compactors when using shuffle sharding. 
385+ This alert fires if the compactor is not able to update the visit marker across all tenants. 
386+ The marker file is a very small json file that should never have any problems getting updated. 
387+ 
388+ How to **investigate**: 
389+ - Verify the logs for the compactors, they should show the exact reason 
390+ - If you see the `context canceled` or any other timeouts in the logs, 
391+ consider increasing `-compactor.compaction-visit-marker-timeout` and `-compactor.compaction-visit-marker-file-update-interval`. 
392+ 
382393### CortexCompactorHasNotSuccessfullyRunCompaction 
383394
384395This alert fires if the compactor is not able to successfully compact all discovered compactable blocks (across all tenants). 
 
 
   
 
     
   
   
          
    
    
     
    
      
     
     
    You can’t perform that action at this time.
  
 
    
  
    
      
        
     
       
      
     
   
 
    
    
  
 
  
 
     
    
0 commit comments