Skip to content

feat: implement unlimited pagination for Kafka message listing#2226

Open
c-julin wants to merge 3 commits intomasterfrom
jc/infinite-pagination-backend
Open

feat: implement unlimited pagination for Kafka message listing#2226
c-julin wants to merge 3 commits intomasterfrom
jc/infinite-pagination-backend

Conversation

@c-julin
Copy link
Contributor

@c-julin c-julin commented Feb 17, 2026

Summary

Adds cursor-based pagination support to the Kafka message listing API. When page_size is set, the backend enters pagination mode — fetching messages in pages with opaque page tokens for resumption, while max_results is ignored. This enables clients to paginate through all messages in a topic without a hard upper limit.

  • Implement PageToken encoding/decoding with per-partition offsets and direction tracking
  • Add pagination-aware message fetching that respects PageSize and generates next_page_token in the stream completion message
  • Reject filter usage in pagination mode (filters are incompatible with cursor-based pagination)
  • Extract collectPartitionStates helper to reduce cyclomatic complexity in the listing logic
  • Fix integration tests to use PageSize to correctly trigger pagination mode
  • Update proto comments to reflect that page_size controls pagination mode, not max_results

Add stateless cursor-based pagination for ListMessages:
- Add page token with per-partition cursor tracking and round-robin
  distribution across partitions
- Wire pagination fields through proto, API, and service layers
- Handle ascending/descending direction with compaction adjustment
- Cap total messages (pageSize * partitions) at 100K to prevent OOM
- Add integration tests for pagination scenarios
Move the first-pass partition loop from calculateConsumeRequestsWithPageToken
into a dedicated collectPartitionStates method, bringing cyclomatic complexity
under the gocyclo limit of 30.
Replace MaxResults: -1 with PageSize: 50 in pagination tests so they
correctly enter pagination mode. Update proto comments to reflect that
PageSize (not MaxResults=-1) enables pagination mode.
@github-actions
Copy link
Contributor

The latest Buf updates on your PR. Results from workflow Buf CI / validate (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 17, 2026, 4:11 PM

@c-julin
Copy link
Contributor Author

c-julin commented Feb 17, 2026

@claude review

@claude
Copy link

claude bot commented Feb 17, 2026

Claude finished @c-julin's task in 2m 35s —— View job


Code Review

  • Read repository guidelines (CLAUDE.md)
  • Review core pagination implementation (page_token.go)
  • Review message listing changes (list_messages.go)
  • Review service integration (service.go)
  • Review test coverage
  • Review proto/API changes
  • Provide comprehensive feedback

Overall Assessment

This is a well-implemented feature that adds cursor-based pagination support to Kafka message listing. The code demonstrates solid engineering practices with comprehensive test coverage, good error handling, and clear separation of concerns. Below are specific findings and suggestions for improvement.

Strengths

1. Excellent Test Coverage

  • Comprehensive unit tests for PageToken including encode/decode, validation, edge cases (page_token_test.go:19-371)
  • Integration tests covering multiple scenarios: first page, multiple pages, filter rejection, legacy mode compatibility (handle_topic_messages_integration_test.go:1118-1273)
  • Boundary condition testing for both ascending and descending pagination

2. Security and Validation

  • Page token includes strong validation with topic name verification (page_token.go:82-140)
  • Proper limits enforced: page size 1-500, total messages capped at 100k (page_token.go:104-112)
  • Rejects filter usage in pagination mode to prevent incompatible operations (list_messages.go:231-233)

3. Stateless Design

  • Opaque page tokens encode all necessary state (partition offsets, water marks, direction)
  • URL-safe base64 encoding suitable for API usage (page_token.go:53)

4. Backwards Compatibility

  • Legacy mode preserved when PageSize is not set (list_messages.go:263-268)
  • Integration test explicitly verifies legacy mode still works (handle_topic_messages_integration_test.go:1242-1272)

Issues and Suggestions

Critical Issues

1. Race Condition in Page Token Direction Logic

Location: list_messages.go:276-285

direction := DirectionAscending // Legacy mode is ascending
// Determine the max message count for this request
// In pagination mode, use PageSize; otherwise use MessageCount (maxResults)
maxMessageCount := listReq.MessageCount
if listReq.PageSize > 0 && token != nil {
    // Pagination mode: use direction from token and limit to PageSize
    direction = token.Direction
    maxMessageCount = listReq.PageSize
}

Issue: The direction variable is set to DirectionAscending for legacy mode, but legacy mode actually defaults to descending in other parts of the code. The comment says "Legacy mode is ascending" but this contradicts the behavior in calculateConsumeRequests where StartOffsetRecent (default) consumes backwards (descending).

Impact: This could cause inconsistent message ordering between legacy and pagination modes.

Recommendation: Clarify the actual legacy mode direction and ensure consistency. If legacy mode should be ascending, update calculateConsumeRequests accordingly. If it should be descending, update this initialization.

2. Missing Partition Compaction Handling in Descending Mode

Location: page_token.go:142-212 and list_messages.go:571-580

Issue: The code handles compaction adjustments only for ascending mode (list_messages.go:573-580), but descending mode could also encounter compaction where the NextOffset points to a deleted offset below the new low watermark.

Current code:

if token.Direction == DirectionAscending && cursor.NextOffset < updatedLowWaterMark {
    cursor.NextOffset = updatedLowWaterMark
}

Missing case: When in descending mode, if compaction occurs and cursor.NextOffset < updatedLowWaterMark, the code should also adjust but instead proceeds to the exhaustion check where it might incorrectly mark as exhausted.

Recommendation: Add compaction handling for descending mode:

// Handle compaction for both directions
if cursor.NextOffset < updatedLowWaterMark {
    s.logger.DebugContext(ctx, "adjusting nextOffset due to compaction",
        slog.String("topic", token.TopicName),
        slog.Int("partition", int(cursor.ID)),
        slog.Int64("old_next_offset", cursor.NextOffset),
        slog.Int64("new_low_watermark", updatedLowWaterMark))
    cursor.NextOffset = updatedLowWaterMark
}

High Priority Issues

3. Inconsistent Empty Topic Handling

Location: page_token.go:225-230 in CreateInitialPageToken

Issue: The test expects NextOffset = 0 for empty topics (page_token_test.go:225-230), but the actual calculation for descending mode is:

nextOffset = end.Offset - 1  // 0 - 1 = -1
if nextOffset < start.Offset {
    nextOffset = start.Offset  // clamps to 0
}

This works correctly, but the logic is subtle and relies on the clamping. For ascending mode starting from an empty partition (LWM=HWM=0), NextOffset=0 would immediately be marked exhausted since nextOffset >= highWaterMark.

Recommendation: Add explicit handling for empty partitions to make the logic clearer and more maintainable:

var nextOffset int64
if start.Offset == end.Offset {
    // Empty partition: no messages available
    nextOffset = start.Offset
} else if direction == DirectionDescending {
    // For descending order, start from high water mark - 1
    nextOffset = end.Offset - 1
    if nextOffset < start.Offset {
        nextOffset = start.Offset
    }
} else {
    // For ascending order, start from low water mark
    nextOffset = start.Offset
}

4. No Page Token Version/Schema Evolution

Location: page_token.go:21-30

Issue: The PageToken structure lacks a version field for schema evolution. If you need to add fields or change the structure in the future, existing tokens in users' applications will fail to decode or behave incorrectly.

Recommendation: Add a version field:

type PageToken struct {
    Version        int               `json:"v"`  // Schema version (default 1)
    TopicName      string            `json:"t"`
    // ... rest of fields
}

And validate it:

func (pt *PageToken) Validate() error {
    if pt.Version == 0 {
        pt.Version = 1 // Default to v1 for backwards compat
    }
    if pt.Version != 1 {
        return fmt.Errorf("unsupported page token version: %d", pt.Version)
    }
    // ... rest of validation
}

5. Potential Integer Overflow in Total Messages Calculation

Location: page_token.go:108-112

const maxTotalMessages = 100_000
if int64(pt.PageSize)*int64(pt.PartitionCount) > maxTotalMessages {
    return fmt.Errorf("page size %d with %d partitions exceeds max total messages (%d)",
        pt.PageSize, pt.PartitionCount, maxTotalMessages)
}

Issue: While the current implementation uses int64 multiplication which is safe, the error message is slightly misleading. The check prevents requesting PageSize * PartitionCount total messages, but the actual fetch is limited to PageSize messages distributed across partitions via round-robin.

Recommendation: Clarify the error message or reconsider if this check is necessary:

// This check prevents unreasonably large page sizes when combined with many partitions
// The actual fetch will distribute PageSize messages across partitions
if int64(pt.PageSize)*int64(pt.PartitionCount) > maxTotalMessages {
    return fmt.Errorf("page size %d with %d partitions would request %d total messages, exceeding limit of %d",
        pt.PageSize, pt.PartitionCount, int64(pt.PageSize)*int64(pt.PartitionCount), maxTotalMessages)
}

Medium Priority Issues

6. Unclear Proto Comment About Pagination Mode Trigger

Location: list_messages.proto:40-44

Comment says:

"Number of messages to fetch per page. When set, pagination mode is enabled and max_results is ignored. Defaults to 50 if not set."

Issue: The comment says "Defaults to 50 if not set" but the validation shows it's optional (int32 page_size = 14) and the code checks if listReq.PageSize > 0 to enable pagination mode. If it defaults to 50, pagination would always be enabled.

Recommendation: Clarify the comment:

int32 page_size = 14 [(buf.validate.field).int32 = {
  gte: 1
  lte: 500
}]; // Number of messages per page. When > 0, pagination mode is enabled and max_results is ignored. When 0 or unset, legacy mode is used.

7. Missing Context Cancellation Check in Round-Robin Loop

Location: list_messages.go:682-709

Issue: The round-robin distribution loop could theoretically run for a long time with many partitions and large page sizes, but it doesn't check for context cancellation.

Recommendation: Add periodic cancellation checks:

for remainingMessages > 0 && yieldingPartitions > 0 {
    // Check for cancellation periodically
    select {
    case <-ctx.Done():
        return nil, "", false, ctx.Err()
    default:
    }
    
    progressMade := false
    // ... rest of loop
}

8. Potential Memory Inefficiency in Partition States

Location: list_messages.go:654-657

Issue: The code creates a new slice for nextToken.Partitions with full capacity even though some partitions might be exhausted and not included:

nextToken := &PageToken{
    // ...
    Partitions:     make([]PartitionCursor, 0, len(token.Partitions)),
    // ...
}

Recommendation: Pre-allocate based on actual active partitions (after exhaustion check), or use the actual length after the fact:

nextToken := &PageToken{
    TopicName:      token.TopicName,
    PartitionCount: token.PartitionCount,
    Partitions:     make([]PartitionCursor, 0), // Use 0 capacity, let it grow naturally
    Direction:      token.Direction,
    PageSize:       token.PageSize,
}

Low Priority / Nice-to-Have

9. Documentation: Clarify NextOffset Semantics

Location: page_token.go:32-38

Issue: The PartitionCursor struct has good comments, but the semantics of NextOffset in relation to direction could be clearer.

Current:

NextOffset    int64 `json:"no"` // Next offset to read from

Suggestion:

// NextOffset is the next offset to consume:
// - Descending: the highest offset to read (inclusive), next page reads from NextOffset-PageSize
// - Ascending: the lowest offset to read (inclusive), next page reads from NextOffset+PageSize
// - May be -1 when descending mode exhausts a partition (read back to offset 0)
NextOffset    int64 `json:"no"`

10. Integration Test Coverage: Missing Edge Cases

Location: handle_topic_messages_integration_test.go:1118-1273

Missing test scenarios:

  1. Partition count changes between pages (should error with "partition count changed")
  2. Topic gets compacted between pages (water marks change)
  3. Ascending order pagination (only descending is tested)
  4. Single partition topic pagination
  5. Empty topic pagination

Recommendation: Add integration tests for these edge cases to ensure robustness.

11. Code Duplication in Exhaustion Checks

Location: Multiple places check partition exhaustion with similar logic

The exhaustion logic appears in:

  • PageToken.HasMore() (page_token.go:215-231)
  • PageToken.IsExhausted() (page_token.go:233-245)
  • collectPartitionStates() (list_messages.go:583-591)
  • calculateConsumeRequestsWithPageToken() (list_messages.go:756-761)

Recommendation: Consider extracting to a single helper method to reduce duplication and ensure consistency:

func (cursor *PartitionCursor) IsExhausted(direction string) bool {
    if direction == DirectionDescending {
        return cursor.NextOffset < cursor.LowWaterMark
    }
    return cursor.NextOffset >= cursor.HighWaterMark
}

12. Nolint Directives Could Be Reduced

Location: list_messages.go:147,323,631

Issue: Multiple nolint directives for cyclomatic complexity. While the functions are indeed complex, the new collectPartitionStates helper (list_messages.go:542-626) successfully reduced complexity for that portion.

Recommendation: Consider further extraction of logical blocks into helper functions to naturally reduce complexity without needing lint exceptions.

Minor Observations

  1. Good use of constants: DirectionDescending and DirectionAscending are well-defined constants (list_messages.go:44-47)

  2. Comprehensive logging: Debug and warning logs throughout pagination logic aid in troubleshooting (e.g., list_messages.go:574-578, 594-598)

  3. Proper resource handling: Defer statements ensure client cleanup (list_messages.go:824)

  4. Cyclomatic complexity extraction: The refactoring to extract collectPartitionStates is a good example of managing complexity (mentioned in PR description)

Summary

This PR implements a solid foundation for unlimited pagination. The critical issues around direction consistency and compaction handling should be addressed before merging. The medium priority issues around error messages, context cancellation, and versioning would improve robustness. The low priority suggestions would enhance maintainability and test coverage but aren't blockers.

The overall architecture is sound, and the implementation demonstrates good engineering practices with strong test coverage and clear separation of concerns.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant