Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support metrics scrape #2344

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jiuker
Copy link
Contributor

@jiuker jiuker commented Oct 18, 2024

feat: support metrics scrape
fix #2327
we just support /minio/v2/metrics/cluster before
now we can do more type for this
like

prometheusOperatorScrapeMetricsPath:
- /minio/v2/metrics/cluster
- /minio/metrics/v3/api
- /minio/v2/metrics/bucket

default is /minio/v2/metrics/cluster

@harshavardhana
Copy link
Member

We should add a way to scrape v3 metrics as well

@jiuker jiuker requested a review from shtripat October 18, 2024 07:43
@jiuker jiuker self-assigned this Oct 21, 2024
@cesnietor
Copy link
Contributor

@jiuker is there a way we can create tests for this? since this is a new feature? like an integration test? Thanks.

@pjuarezd pjuarezd force-pushed the feat-support-metrics-scrope branch from b38e1b6 to 982f5d4 Compare December 9, 2024 19:51
pjuarezd
pjuarezd previously approved these changes Dec 9, 2024
@jiuker
Copy link
Contributor Author

jiuker commented Feb 5, 2025

Will take a look

@jiuker jiuker force-pushed the feat-support-metrics-scrope branch from 40012a7 to 732b95c Compare February 6, 2025 03:52
@allanrogerr
Copy link
Contributor

@jiuker PTAL - pkg:golang/golang.org/x/[email protected] vulnerability

}
}
if !hasScrapeConfig {
exceptedScrapeConfigs = append(exceptedScrapeConfigs, promCfg.ScrapeConfigs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to remove scrapeConfigs is to set prometheusOperator to false, then re-initialize the array. This does not seem correct.

Copy link
Contributor Author

@jiuker jiuker Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesh. prometheusOperator: true with this prometheusOperatorScrapeMetricsPath:[] is means default /minio/v2/metrics/cluster like before.
I think we should compatible that user don't set that before. Any advice ? @allanrogerr ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean expectedScrapeConfigs instead of exceptedScrapeConfigs? Or did you mean acceptedScapeConfigs???

}

for index, scrape := range t.Spec.PrometheusOperatorScrapeMetricsPath {
promConfig.ScrapeConfigs = append(promConfig.ScrapeConfigs, ScrapeConfig{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following tenant yaml section; note that v3 cluster metrics is scraped using /minio/v3/metrics/cluster/ (see https://github.com/minio/minio/blob/master/docs/metrics/v3.md):

  prometheusOperator: true
  prometheusOperatorScrapeMetricsPath:
  - /minio/v2/metrics/bucket
  - /minio/v2/metrics/cluster
  - /minio/v2/metrics/node
  - /minio/v2/metrics/resource
  - /minio/v3/metrics/cluster/

v3 metrics scraping is failing with server returned HTTP status 403 Forbidden. Please try and check this.

Copy link
Contributor Author

@jiuker jiuker Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesh. Should be like /minio/metrics/v3/*, but that can be set with this strings slice field.

@jiuker jiuker requested a review from allanrogerr March 11, 2025 07:36
Copy link
Contributor

@ramondeklein ramondeklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no expert on Prometheus metrics and MinIO, but I think this code doesn't work correctly. Please make these changes to AIStor operator and implement proper integration tests.

// The name of the Prometheus instance to scrape metrics from.
//
// +optional
PrometheusOperatorScrapeMetricsPath []string `json:"prometheusOperatorScrapeMetricsPath,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can specify multiple values, then this should probably be:

Suggested change
PrometheusOperatorScrapeMetricsPath []string `json:"prometheusOperatorScrapeMetricsPath,omitempty"`
PrometheusOperatorScrapeMetricsPaths []string `json:"prometheusOperatorScrapeMetricsPaths,omitempty"`

@@ -278,6 +278,12 @@ type TenantSpec struct {
PrometheusOperator bool `json:"prometheusOperator,omitempty"`
// *Optional* +
//
// The name of the Prometheus instance to scrape metrics from.
//
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some information about the default (or fallback) value?

} else {
for i := range scrapeConfigs {
if scrapeConfigs[i].JobName != exceptedScrapeConfigs[i].JobName ||
scrapeConfigs[i].MetricsPath != exceptedScrapeConfigs[i].MetricsPath ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comment that the generation of the scrape configuration always creates a new bearer token, so it shouldn't be included in this check, because it will always result in it being not equal.

How do we make sure the bearer token is updated when the tenant gets assigned other credentials (different access/secret key). This will result in another bearer token (but nothing else), so it will decide that the scape configuration hasn't changed, but Prometheus won't be able to access MinIO with the old token.

@@ -161,20 +163,34 @@ func (c *Controller) checkAndCreatePrometheusAddlConfig(ctx context.Context, ten
return err
}
} else {
var scrapeConfigs []configmaps.ScrapeConfig
var scrapeConfigs, exceptedScrapeConfigs []configmaps.ScrapeConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change line 146 to include the namespace/tenant that is getting the Prometheus scrape configuration?

}
}
if !hasScrapeConfig {
exceptedScrapeConfigs = append(exceptedScrapeConfigs, promCfg.ScrapeConfigs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean expectedScrapeConfigs instead of exceptedScrapeConfigs? Or did you mean acceptedScapeConfigs???

jiuker added 2 commits March 12, 2025 13:16
refactor
UT
UT
@jiuker jiuker requested a review from ramondeklein March 12, 2025 05:25
ut
ut
// GetAccessKeyFromBearerToken parses the BearerToken with secretKey to extract accessKey
func GetAccessKeyFromBearerToken(bearerToken string, secretKey string) (string, error) {
claims := &jwt.StandardClaims{}
token, err := jwt.ParseWithClaims(bearerToken, claims, func(token *jwt.Token) (interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address linter issue:

Suggested change
token, err := jwt.ParseWithClaims(bearerToken, claims, func(token *jwt.Token) (interface{}, error) {
token, err := jwt.ParseWithClaims(bearerToken, claims, func(_ *jwt.Token) (interface{}, error) {

@@ -224,27 +247,20 @@ func (c *Controller) deletePrometheusAddlConfig(ctx context.Context, tenant *min
return err
}

var scrapeConfigs []configmaps.ScrapeConfig
var scrapeConfigs, exceptedScrapeConfigs []configmaps.ScrapeConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a typo:

Suggested change
var scrapeConfigs, exceptedScrapeConfigs []configmaps.ScrapeConfig
var scrapeConfigs, expectedScrapeConfigs []configmaps.ScrapeConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fine now

}
accKey, err := miniov2.GetAccessKeyFromBearerToken(scrapeConfigs[i].BearerToken, secretKey)
if err != nil {
klog.Errorf("Failed to get access key from bearer token: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When secretKey is changed, then it will not be able to decrypt the existing bearer-token using the new secret key. The function will return an error. That shouldn't be logged, but is expected, but it should set updateScrapeConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignored it now

apply suggestion
@jiuker jiuker requested a review from ramondeklein March 14, 2025 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: expose more Prometheus metrics than just /minio/v2/metrics/cluster
7 participants