Record end-to-end indexing duration in S3 file notification source #5811

tontinton · 2025-06-19T19:59:00Z

No description provided.

rdettai

Thanks for the contribution. It's very specific, but I think it can be valuable to any user. Could you try also running the coverage tests (which also contain SQS tests) on your fork repo? It seems I can't do that from this repo.

rdettai · 2025-06-23T12:36:52Z

quickwit/quickwit-common/src/metrics.rs

+/// This function returns `index_name-source_name` or projects it to `<any>` if per-index metrics are disabled.
+pub fn source_label(index_name: &str, source_name: &str) -> String {
+    if is_per_index_metrics_enabled() {
+        format!("{index_name}-{source_name}")
+    } else {
+        "__any__".to_string()
+    }
+}


Having the source and the index in separate labels is better in all ways.

rdettai · 2025-06-23T12:41:11Z

quickwit/quickwit-indexing/src/metrics.rs

+                    // 15 seconds up to 3 minutes
+                    linear_buckets(15.0, 15.0, 12).unwrap(),


Is 3 min really enough in general? I think you might be interested to see much higher value when the system starts to behave badly. But 12 buckets is already a lot, I would then switch to an exponential scale.

rdettai · 2025-06-23T12:46:30Z

quickwit/quickwit-indexing/src/metrics.rs

+            queue_source_index_duration_seconds: Lazy::new(|| {
+                new_histogram_vec(
+                    "queue_source_index_duration_seconds",
+                    "Number of seconds it took since the message was generated until it was sent \


Suggested change

"Number of seconds it took since the message was generated until it was sent \

"Duration (seconds) between the queue message event time (parsed from its content if available) and its acknowledgment"

rdettai · 2025-06-23T13:00:08Z

quickwit/quickwit-indexing/src/source/queue_sources/local_state.rs

-    /// associated ack_id
-    awaiting_commit: BTreeMap<PartitionId, String>,
+    /// associated ack_id and optional creation timestamp
+    awaiting_commit: BTreeMap<PartitionId, (String, Option<OffsetDateTime>)>,


We could refactor (String, Option<OffsetDateTime>) to

pub struct AwaitingCommitMessage { ack_id: String, source_event_time_opt: Option<OffsetDateTime> }

It would help readability I think.

rdettai · 2025-06-23T13:00:38Z

quickwit/quickwit-indexing/src/source/queue_sources/message.rs

@@ -122,6 +128,7 @@ impl PreProcessedPayload {
 pub struct PreProcessedMessage {
    pub metadata: MessageMetadata,
    pub payload: PreProcessedPayload,
+    pub timestamp_opt: Option<OffsetDateTime>,


Suggested change

pub timestamp_opt: Option<OffsetDateTime>,

pub source_event_time_opt: Option<OffsetDateTime>,

rdettai · 2025-06-23T13:17:18Z

quickwit/quickwit-indexing/src/source/queue_sources/coordinator.rs

+            let completed_opt = self.local_state.mark_completed(partition_id);
+            if let Some((ack_id, timestamp_opt)) = completed_opt {
+                if let Some(timestamp) = timestamp_opt {
+                    let duration = OffsetDateTime::now_utc() - timestamp;


This can panic if something is off with the event time. It's unlikely, but it's unfortunate to panic because of a metric. Refactor this as:

fn record_index_duration_metric(index_id: &str, source_id: &str, source_event_time_opt: Option<OffsetDateTime>) { let Some(source_event_time) = source_event_time_opt else { return }; let now = OffsetDateTime::now_utc(); if now < source_event_time { error!("Event time smaller than current time"); return; } let duration = now - source_event_time; let index_label = index_label( index_id, ); let source_label = source_label( source_id, ); crate::metrics::INDEXER_METRICS .queue_source_index_duration_seconds .with_label_values([index_label, source_label]) .observe(duration.as_seconds_f64()); }

rdettai · 2025-06-23T16:52:51Z

Did you check out the SQS ApproximateAgeOfOldestMessage metric? I'm not sure what you want to monitor exactly, but usually it's what you would use to monitor that queues are processed through in a timely manner.

tontinton marked this pull request as draft June 19, 2025 19:59

Record end-to-end indexing duration in S3 file notification source

70e9318

tontinton force-pushed the queue-source-duration-metric branch from 90dca71 to 70e9318 Compare June 19, 2025 20:02

tontinton marked this pull request as ready for review June 19, 2025 20:02

rdettai reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Record end-to-end indexing duration in S3 file notification source #5811

Record end-to-end indexing duration in S3 file notification source #5811

tontinton commented Jun 19, 2025

Uh oh!

rdettai left a comment

Uh oh!

rdettai Jun 23, 2025

Uh oh!

rdettai Jun 23, 2025

Uh oh!

rdettai Jun 23, 2025

Uh oh!

rdettai Jun 23, 2025

Uh oh!

rdettai Jun 23, 2025

Uh oh!

rdettai Jun 23, 2025

Uh oh!

rdettai commented Jun 23, 2025

Uh oh!

Uh oh!

		// 15 seconds up to 3 minutes
		linear_buckets(15.0, 15.0, 12).unwrap(),

	"Number of seconds it took since the message was generated until it was sent \
	"Duration (seconds) between the queue message event time (parsed from its content if available) and its acknowledgment"

	pub timestamp_opt: Option<OffsetDateTime>,
	pub source_event_time_opt: Option<OffsetDateTime>,

Record end-to-end indexing duration in S3 file notification source #5811

Are you sure you want to change the base?

Record end-to-end indexing duration in S3 file notification source #5811

Conversation

tontinton commented Jun 19, 2025

Uh oh!

rdettai left a comment

Choose a reason for hiding this comment

Uh oh!

rdettai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rdettai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rdettai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rdettai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rdettai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rdettai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rdettai commented Jun 23, 2025

Uh oh!

Uh oh!