Shard query splitting: Split queries by stream shards #715

matyax · 2024-08-22T16:25:32Z

Query splitting by stream shards.

Query splitting was introduced in Loki to optimize querying for long intervals and high volume of data, dividing a big request into smaller sub-requests, combining and displaying the results as they arrive.

This approach, inspired by the time-based query splitting, takes advantage of the stream_shard internal label, representing how data is spread into different sources that can be queried individually.

The main entry point of this module is runShardSplitQuery(), which prepares the query for execution and passes it to splitQueriesByStreamShard() to begin the querying loop.

splitQueriesByStreamShard() has the following structure:

Creates and returns an Observable to which the UI will subscribe
Requests the stream_shard values of the selected service:
- If there are no shard values, it falls back to the standard querying approach of the data source in runNonSplitRequest()
- If there are shards:
  - It groups the shard requests in an array of arrays of shard numbers in groupShardRequests()
  - It begins the querying loop with runNextRequest()
runNextRequest() will send a query using the nth (cycle) shard group, and has the following internal structure:
- adjustTargetsFromResponseState() will filter log queries targets that already received the requested maxLines
- interpolateShardingSelector() will update the stream selector with the current shard numbers
- After query execution:
  - If the response is successful:
    - It will add new data to the response with combineResponses()
    - nextRequest() will use the current cycle and the total groups to determine the next request or complete execution with done()
  - If the response is unsuccessful:
    - If there are retry attempts, it will retry the current cycle, or else continue with the next cycle
    - If the returned error is Maximum series reached, it will not retry
Once all request groups have been executed, it will be done()

Key features

Shards retrieved by a /values call including the selected service.
Starts with low volume shards, intertwined with higher volume shards.
Groups shards based on the current time interval and available shards.
Sends queries grouping shards in a selector __stream_shard__=~"shardn1|shardn2|...|shardn".
For short intervals (< 24 hours) groups shards in queries. For longer intervals, queries individual shards.
Recombines responses without asuming any particular order, including overlapping values.
Retries failed requests up to 3 times, with exponential backoff (1500 ms * 2^failed attempts of waiting time) Does not retry if the response is "maximum series reached".

Important

Requires the exploreLogsShardSplitting feature flag enabled

Closes #690

gtk-grafana · 2024-08-23T11:21:54Z

should this be in draft for now?

matyax · 2024-08-23T11:23:26Z

100%. Just wanted the build for publishing.

matyax · 2024-09-03T15:45:14Z

@grafana/observability-logs If you want to start the review while I work on the coverage, that'd be great. I don't plan to add any significant change to the code for now.

matyax · 2024-09-03T20:30:51Z

Coverage added 😤

.github/workflows/ci.yml

project-words.txt

matyax · 2024-09-04T17:39:15Z

src/services/combineResponses.test.ts

@@ -0,0 +1,1403 @@
+import { DataFrame, DataFrameType, DataQueryResponse, Field, FieldType, QueryResultMetaStat } from '@grafana/data';


See https://github.com/grafana/grafana/blob/main/packages/grafana-o11y-ds-frontend/src/combineResponses.test.ts

matyax · 2024-09-04T17:39:44Z

src/services/combineResponses.ts

@@ -0,0 +1,281 @@
+import {


See https://github.com/grafana/grafana/blob/main/packages/grafana-o11y-ds-frontend/src/combineResponses.ts

matyax · 2024-09-04T17:43:16Z

src/services/shardQuerySplitting.test.ts

@@ -0,0 +1,194 @@
+import { of } from 'rxjs';


See https://github.com/grafana/grafana/blob/main/public/app/plugins/datasource/loki/querySplitting.test.ts

matyax · 2024-09-04T17:44:26Z

src/services/shardQuerySplitting.ts

@@ -0,0 +1,264 @@
+import { Observable, Subscriber, Subscription } from 'rxjs';


See: https://github.com/grafana/grafana/blob/main/public/app/plugins/datasource/loki/querySplitting.ts

gtk-grafana · 2024-09-10T19:32:17Z

Can we prevent "no data" from showing up (and keep the panel in the loading state) when it's streaming but the shards we've queried so far don't have data?

matyax · 2024-09-10T19:41:21Z

Definitely. What I wanted to try was to create the initial frame with fields with empty values, and see if that gets rid of that behavior. I did a quick test, but it will require an update to the merging code and some extra complexity, so I'd like to follow that up independently. And/or check the viz to see what is it that triggers the display of No Data.

gtk-grafana · 2024-09-10T19:44:49Z

From what I remember the streaming (LoadingState) state expects dataframes, I think setting the initial state as LoadingState.Loading fixes 90% of this without any other refactor

matyax · 2024-09-10T19:54:27Z

Fantastic, I'll give it a try. The reason why we used streaming with splitting was so that the panel updates the viz as new data appears, but we can start with loading and change to streaming easily.

gtk-grafana · 2024-09-10T20:00:17Z

Yeah I think it should be "loading" state, until we get a non-empty response. We shouldn't update the "series" prop until we get a non-empty response, or the request is done.

But easier said then done :P if it's too much work we can punt on it as this is under a ff, but this is the only real issue I'm seeing right now in my testing.

A good way to see how big of a UX change this is, is by enabling auto-refresh on this branch vs main

This reverts commit c6c2cd0.

… to Streaming

matyax marked this pull request as ready for review August 22, 2024 16:39

matyax requested a review from a team as a code owner August 22, 2024 16:39

matyax force-pushed the matyax/shard-splitting branch 5 times, most recently from 4706de5 to 3a5df48 Compare August 29, 2024 11:09

gtk-grafana reviewed Sep 4, 2024

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

gtk-grafana reviewed Sep 4, 2024

View reviewed changes

project-words.txt Outdated Show resolved Hide resolved

matyax changed the title ~~Shard splitting: WIP~~ Shard query splitting: Split queries by stream shards Sep 4, 2024

matyax requested review from a team and gtk-grafana September 4, 2024 15:24

matyax commented Sep 4, 2024

View reviewed changes

matyax force-pushed the matyax/shard-splitting branch from 69da7ea to c0df8ca Compare September 4, 2024 23:44

matyax mentioned this pull request Sep 5, 2024

Shard query splitting: Test in CI #747

Closed

matyax force-pushed the matyax/shard-splitting branch 2 times, most recently from e14bb19 to 6c34669 Compare September 10, 2024 19:23

gtk-grafana mentioned this pull request Sep 10, 2024

chore: enable sharding in docker containers, add shards to labels in generator #753

Closed

matyax added 23 commits September 11, 2024 13:24

feat(querying): exclude log queries from sharding

47b5501

fix(getShardRequests): debug group generation

0066811

feat(shardQuerySplitting): integrate feature flag

deb533d

feat(shardQuerySplitting): fix falling back to regular query

f1bc31b

chore: rename functions

d786308

feat(jest): add observable matchers

9c3977b

test(shardQuerySplitting): add base test setup

23a8170

test(shardQuerySplitting): add more tests

8b4d8eb

Revert "chore(ci): simplify"

896b64f

This reverts commit c6c2cd0.

chore: spelling

563767a

chore(shardQuerySplitting): add docs

94c3a6e

chore: simplify

1d32c98

chore: fix test title

1f51768

fix(logql): improve selector for single shard

37bc0e8

feat(shardQuerySpitting): try new sharding approach

65cb27e

feat(shardQuerySpitting): improve shard volume spreading

f457123

chore: spelling

de87626

feat(shardQuerySplitting): start with the highest numbers

0705076

fix(combineResponses): small optimization

284af9f

feat(shardQuerySplitting): implement incremental backoff

b07278c

chore: words

dd4ba97

fix(shardQuerySplitting): skip avg from sharding

ef38488

fix(shardQuerySplitting): start with loading state Loading and switch…

bbe917c

… to Streaming

matyax force-pushed the matyax/shard-splitting branch from ed238fb to bbe917c Compare September 11, 2024 16:24

matyax enabled auto-merge (squash) September 11, 2024 16:25

matyax added 4 commits September 11, 2024 15:47

chore: update e2e

d968b23

chore: update e2e tests

bcbf11c

chore: disable exploreLogsShardSplitting in ci

333f7d5

chore: revert test update

9a91c2d

matyax merged commit c9eacbd into main Sep 11, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard query splitting: Split queries by stream shards #715

Shard query splitting: Split queries by stream shards #715

matyax commented Aug 22, 2024 •

edited

Loading

gtk-grafana commented Aug 23, 2024

matyax commented Aug 23, 2024

matyax commented Sep 3, 2024

matyax commented Sep 3, 2024

matyax Sep 4, 2024

matyax Sep 4, 2024

matyax Sep 4, 2024

matyax Sep 4, 2024

gtk-grafana commented Sep 10, 2024 •

edited

Loading

matyax commented Sep 10, 2024

gtk-grafana commented Sep 10, 2024

matyax commented Sep 10, 2024

gtk-grafana commented Sep 10, 2024 •

edited

Loading

		@@ -0,0 +1,1403 @@
		import { DataFrame, DataFrameType, DataQueryResponse, Field, FieldType, QueryResultMetaStat } from '@grafana/data';

		@@ -0,0 +1,264 @@
		import { Observable, Subscriber, Subscription } from 'rxjs';

Shard query splitting: Split queries by stream shards #715

Shard query splitting: Split queries by stream shards #715

Conversation

matyax commented Aug 22, 2024 • edited Loading

Query splitting by stream shards.

Key features

gtk-grafana commented Aug 23, 2024

matyax commented Aug 23, 2024

matyax commented Sep 3, 2024

matyax commented Sep 3, 2024

matyax Sep 4, 2024

Choose a reason for hiding this comment

matyax Sep 4, 2024

Choose a reason for hiding this comment

matyax Sep 4, 2024

Choose a reason for hiding this comment

matyax Sep 4, 2024

Choose a reason for hiding this comment

gtk-grafana commented Sep 10, 2024 • edited Loading

matyax commented Sep 10, 2024

gtk-grafana commented Sep 10, 2024

matyax commented Sep 10, 2024

gtk-grafana commented Sep 10, 2024 • edited Loading

matyax commented Aug 22, 2024 •

edited

Loading

gtk-grafana commented Sep 10, 2024 •

edited

Loading

gtk-grafana commented Sep 10, 2024 •

edited

Loading