Skip to content

Conversation

@kbatuigas
Copy link
Contributor

This PR adds missing endpoints and properties from https://github.com/redpanda-data/redpanda/blob/dev/src/v/redpanda/admin/api-doc/shadow_indexing.json, including /v1/cloud_storage/automated_recovery which needs to be referenced in redpanda-data/docs#1513.

New Tiered Storage Endpoints:

  • Added /v1/cloud_storage/automated_recovery endpoints for initializing and checking the status of cluster recovery.
  • Introduced /v1/cloud_storage/cache/trim endpoint to allow trimming of the local tiered storage cache with optional parameters for objects and bytes.
  • Added /v1/cloud_storage/anomalies/{namespace}/{topic}/{partition} endpoint to retrieve cloud storage anomalies for a specific partition.
  • Added /v1/cloud_storage/unsafe_reset_metadata_from_cloud/{namespace}/{topic}/{partition} endpoint to reset partition metadata from the cloud, with safety checks to prevent data loss.
  • Added /v1/cloud_storage/reset_scrubbing_metadata/{namespace}/{topic}/{partition} endpoint to reset scrubbing metadata and anomalies for a partition.

Schema Enhancements for Cloud Storage Metadata:

  • Expanded partition metadata with new fields: stm_region_size_bytes, archive_size_bytes, stm_region_segment_count, stm_region_start_offset, and improved descriptions for segment counts.
  • Added segment_meta schema to describe uploaded segment metadata, including offsets, timestamps, size, compaction, and terms.
  • Introduced metadata_anomaly and cloud_storage_partition_anomalies schemas to provide detailed reporting of detected anomalies, missing manifests, missing segments, and anomaly discard counts.

Cluster Recovery Status Reporting:

  • Added cluster_recovery_status schema to report the state and error (if any) of cluster recovery operations.

Minor Changes and Cleanups:

  • Removed the unused x-codegen-request-body-name: body from an existing endpoint definition.

@kbatuigas kbatuigas requested a review from a team as a code owner December 12, 2025 01:03
@coderabbitai
Copy link

coderabbitai bot commented Dec 12, 2025

📝 Walkthrough

Walkthrough

The change updates the Admin API specification to introduce cloud storage recovery and cache management functionality. It adds 6 new endpoints for automated recovery status monitoring, cache trimming, and metadata management operations. The specification also introduces 4 new public schemas (cluster_recovery_status, segment_meta, metadata_anomaly, cloud_storage_partition_anomalies) and extends existing schemas including partition_cloud_storage_status and topic_recovery_status with additional fields for STM region tracking and storage metrics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Verify new schema property definitions align with expected types and constraints
  • Confirm endpoint path parameters and request/response bodies are correctly mapped to their respective schemas
  • Validate consistency across related schema extensions, particularly in partition_cloud_storage_status and the new anomaly-related schemas
  • Check that all new endpoint definitions follow the existing OpenAPI specification conventions

Suggested reviewers

  • paulohtb6

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding missing cloud storage endpoints to the Admin API specification.
Description check ✅ Passed The description comprehensively details the new endpoints, schema enhancements, and changes made to the Admin API documentation, directly relating to the changeset.
Linked Issues check ✅ Passed The PR fully implements the primary requirement from DOC-1872 by adding the /v1/cloud_storage/automated_recovery endpoints and extends scope with additional related endpoints and schema enhancements.
Out of Scope Changes check ✅ Passed All changes are within scope: new cloud storage endpoints, schema definitions, and metadata enhancements align with the linked issue's objective to document the automated_recovery endpoint and related cloud storage APIs.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch DOC-1872-add-v1-cloud_storage-automated_recovery-to-admin

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

ℹ️ API content change detected:

No structural change, nothing to display.

Preview documentation

Powered by Bump.sh

@github-actions
Copy link

ℹ️ API content change detected:

No structural change, nothing to display.

Preview documentation

Powered by Bump.sh

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

ℹ️ API content change detected:

No structural change, nothing to display.

Preview documentation

Powered by Bump.sh

@github-actions
Copy link

ℹ️ API content change detected:

No structural change, nothing to display.

Preview documentation

Powered by Bump.sh

@github-actions
Copy link

ℹ️ API content change detected:

No structural change, nothing to display.

Preview documentation

Powered by Bump.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
admin/admin.yaml (5)

2331-2332: Consider returning a typed response body instead of empty content: {} for “sync_local_state” 200.
Right now 200 has no schema, which makes it hard for docs/clients to know what to expect (even a minimal { status: ... }).


2332-2362: /v1/cloud_storage/automated_recovery: request body is an unstructured empty object.
If this endpoint truly takes no input, consider removing requestBody entirely (or define explicit properties if there are known knobs), to reduce ambiguity in generated clients.


2363-2386: /v1/cloud_storage/cache/trim: clarify parameter semantics and bounds (bytes/objects).
Right now bytes/objects are plain integers with no minimum and no note on what happens if both are set (AND vs precedence). Adding that detail avoids misinterpretation.


5099-5114: metadata_anomaly.previous_segment: avoid type: object with allOf.
Some OpenAPI tooling rejects/ignores type when allOf is present. Prefer just nullable: true + allOf: [$ref...] (or use oneOf if it can be either segment_meta or null).


5115-5161: cloud_storage_partition_anomalies: consider defining item schemas for missing_* arrays.
missing_spillover_manifests / missing_segments are array<string>; if these are paths/keys, consider naming/format hints (e.g., description, example, or format) to improve docs usability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eecf277 and 97166b8.

📒 Files selected for processing (1)
  • admin/admin.yaml (4 hunks)

Comment on lines +2417 to +2448
post:
tags:
- Tiered Storage
summary: Reset metadata from cloud
description: Resets the manifest to the one in cloud storage, updating all replicas with the given manifest. The request is refused if applying the change would cause data loss and the force query parameter is unspecified or false.
operationId: unsafe_reset_metadata_from_cloud
parameters:
- name: namespace
in: path
required: true
schema:
type: string
- name: topic
in: path
required: true
schema:
type: string
- name: partition
in: path
required: true
schema:
type: integer
- name: force
in: query
required: false
schema:
type: boolean
responses:
200:
description: Partition metadata is reset
content: {}
/v1/cloud_storage/reset_scrubbing_metadata/{namespace}/{topic}/{partition}:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

unsafe_reset_metadata_from_cloud: document refusal/error responses and make safety explicit.
The description says the request “is refused” when force is unspecified/false, but only a 200 response is documented. Consider adding explicit non-2xx responses (e.g., 409/422 with an error schema) and a stronger warning in summary/description since this is a destructive/high-risk operation.

🤖 Prompt for AI Agents
In admin/admin.yaml around lines 2417 to 2448, the
unsafe_reset_metadata_from_cloud operation only documents a 200 response but its
description states the request is “refused” when force is unspecified/false;
update the OpenAPI operation to explicitly document non-2xx error responses
(e.g., 409 Conflict or 422 Unprocessable Entity) that represent refusal or
potential data-loss errors and reference a standard error response schema (or
create one under components/schemas). Also strengthen the summary/description
with an explicit warning about destructive behavior and clarify that callers
must set force=true to override, and include examples or error messages in the
new response entries to make failure cases clear for clients.

Comment on lines +5004 to 5030
stm_region_size_bytes:
type: integer
description: Total size of the addressable segments in the STM region of the log
archive_size_bytes:
type: integer
description: Total size of the archive region of the log
local_log_size_bytes:
type: integer
description: Total size of the addressable local log for the partition
stm_region_segment_count:
type: integer
description: Number of segments in the STM region of the cloud log
cloud_log_segment_count:
type: integer
description: Number of segments in the STM region of the cloud log
local_log_segment_count:
type: integer
description: Number of segments in the local log
cloud_log_start_offset:
type: integer
nullable: true
description: The first Kafka offset accessible from the cloud (inclusive)
stm_region_start_offset:
type: integer
nullable: true
description: The first Kafka offset accessible from the cloud in the STM region (inclusive)
cloud_log_last_offset:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix cloud_log_segment_count description (currently duplicates STM-region wording).
With stm_region_segment_count added, cloud_log_segment_count’s description reading “Number of segments in the STM region…” is confusing/likely incorrect. Suggest aligning it to “Number of segments in the addressable cloud log” (or whatever the intended meaning is).

🤖 Prompt for AI Agents
In admin/admin.yaml around lines 5004 to 5030, the description for
cloud_log_segment_count incorrectly duplicates STM-region wording; update that
description to reflect the cloud log (e.g., "Number of segments in the
addressable cloud log") so it aligns with stm_region_segment_count and other
cloud_* fields, keeping wording consistent with the surrounding fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants