Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stacked 2/5] metrics: add de-facto standard collectors. #404

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions config/crd/bases/config.nri_balloonspolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -320,12 +320,37 @@ spec:
to expose Prometheus metrics among other things.
example: :8891
type: string
metrics:
default:
enabled:
- policy
- buildinfo
description: Metrics defines which metrics to collect.
properties:
enabled:
description: Enabled enables collection for metrics matched
by glob patterns.
example:
- '*'
items:
type: string
type: array
polled:
description: Polled forces polled collection for metrics matched
by glob patterns.
example:
- computationally-expensive-metrics
items:
type: string
type: array
type: object
prometheusExport:
description: PrometheusExport enables exporting /metrics for Prometheus.
type: boolean
reportPeriod:
description: ReportPeriod is the interval between reporting aggregated
metrics.
default: 30s
description: ReportPeriod is the interval between between collecting
polled metrics.
format: duration
type: string
samplingRatePerMillion:
Expand Down
29 changes: 27 additions & 2 deletions config/crd/bases/config.nri_templatepolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,37 @@ spec:
to expose Prometheus metrics among other things.
example: :8891
type: string
metrics:
default:
enabled:
- policy
- buildinfo
description: Metrics defines which metrics to collect.
properties:
enabled:
description: Enabled enables collection for metrics matched
by glob patterns.
example:
- '*'
items:
type: string
type: array
polled:
description: Polled forces polled collection for metrics matched
by glob patterns.
example:
- computationally-expensive-metrics
items:
type: string
type: array
type: object
prometheusExport:
description: PrometheusExport enables exporting /metrics for Prometheus.
type: boolean
reportPeriod:
description: ReportPeriod is the interval between reporting aggregated
metrics.
default: 30s
description: ReportPeriod is the interval between between collecting
polled metrics.
format: duration
type: string
samplingRatePerMillion:
Expand Down
29 changes: 27 additions & 2 deletions config/crd/bases/config.nri_topologyawarepolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,37 @@ spec:
to expose Prometheus metrics among other things.
example: :8891
type: string
metrics:
default:
enabled:
- policy
- buildinfo
description: Metrics defines which metrics to collect.
properties:
enabled:
description: Enabled enables collection for metrics matched
by glob patterns.
example:
- '*'
items:
type: string
type: array
polled:
description: Polled forces polled collection for metrics matched
by glob patterns.
example:
- computationally-expensive-metrics
items:
type: string
type: array
type: object
prometheusExport:
description: PrometheusExport enables exporting /metrics for Prometheus.
type: boolean
reportPeriod:
description: ReportPeriod is the interval between reporting aggregated
metrics.
default: 30s
description: ReportPeriod is the interval between between collecting
polled metrics.
format: duration
type: string
samplingRatePerMillion:
Expand Down
29 changes: 27 additions & 2 deletions deployment/helm/balloons/crds/config.nri_balloonspolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -320,12 +320,37 @@ spec:
to expose Prometheus metrics among other things.
example: :8891
type: string
metrics:
default:
enabled:
- policy
- buildinfo
description: Metrics defines which metrics to collect.
properties:
enabled:
description: Enabled enables collection for metrics matched
by glob patterns.
example:
- '*'
items:
type: string
type: array
polled:
description: Polled forces polled collection for metrics matched
by glob patterns.
example:
- computationally-expensive-metrics
items:
type: string
type: array
type: object
prometheusExport:
description: PrometheusExport enables exporting /metrics for Prometheus.
type: boolean
reportPeriod:
description: ReportPeriod is the interval between reporting aggregated
metrics.
default: 30s
description: ReportPeriod is the interval between between collecting
polled metrics.
format: duration
type: string
samplingRatePerMillion:
Expand Down
29 changes: 27 additions & 2 deletions deployment/helm/template/crds/config.nri_templatepolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,37 @@ spec:
to expose Prometheus metrics among other things.
example: :8891
type: string
metrics:
default:
enabled:
- policy
- buildinfo
description: Metrics defines which metrics to collect.
properties:
enabled:
description: Enabled enables collection for metrics matched
by glob patterns.
example:
- '*'
items:
type: string
type: array
polled:
description: Polled forces polled collection for metrics matched
by glob patterns.
example:
- computationally-expensive-metrics
items:
type: string
type: array
type: object
prometheusExport:
description: PrometheusExport enables exporting /metrics for Prometheus.
type: boolean
reportPeriod:
description: ReportPeriod is the interval between reporting aggregated
metrics.
default: 30s
description: ReportPeriod is the interval between between collecting
polled metrics.
format: duration
type: string
samplingRatePerMillion:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,37 @@ spec:
to expose Prometheus metrics among other things.
example: :8891
type: string
metrics:
default:
enabled:
- policy
- buildinfo
description: Metrics defines which metrics to collect.
properties:
enabled:
description: Enabled enables collection for metrics matched
by glob patterns.
example:
- '*'
items:
type: string
type: array
polled:
description: Polled forces polled collection for metrics matched
by glob patterns.
example:
- computationally-expensive-metrics
items:
type: string
type: array
type: object
prometheusExport:
description: PrometheusExport enables exporting /metrics for Prometheus.
type: boolean
reportPeriod:
description: ReportPeriod is the interval between reporting aggregated
metrics.
default: 30s
description: ReportPeriod is the interval between between collecting
polled metrics.
format: duration
type: string
samplingRatePerMillion:
Expand Down
10 changes: 5 additions & 5 deletions docs/resource-policy/developers-guide/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,11 +162,11 @@ for post-policy enforcement of decisions.

### [Metrics Collector](tree:/pkg/metrics/)

The metrics collector gathers a set of runtime metrics about the containers
running on the node. NRI-RP can be configured to periodically evaluate this
collected data to determine how optimal the current assignment of container
resources is and to attempt a rebalancing/reallocation if it is deemed
both possible and necessary.
The metrics collector gathers a set of runtime metrics about system resources,
containers running on the node, and policy-specific resource assignments and
expose these as Prometheus metrics. This data can be externally evaluated and
used to trigger rebalancing of resources if the NRI-RP policy implementation
provides a (policy-specific) external interface for this.

### [Policy Implementations](tree:/cmd/plugins)

Expand Down
9 changes: 6 additions & 3 deletions docs/resource-policy/policy/balloons.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,13 +264,13 @@ Balloons policy parameters:
- `prometheusExport`: if set to True, balloons with their CPUs
and assigned containers are readable through `/metrics` from the
httpEndpoint.
- `reportPeriod`: `/metrics` aggregation interval.
- `reportPeriod`: `/metrics` aggregation interval for polled metrics.

### Example

Example configuration that runs all pods in balloons of 1-4
CPUs. Instrumentation enables reading CPUs and containers in balloons
from `http://localhost:8891/metrics`.
from `http://$localhost_or_pod_IP:8891/metrics`.

```yaml
apiVersion: config.nri/v1alpha1
Expand Down Expand Up @@ -413,9 +413,12 @@ nri-resource-policy global config:
instrumentation:
# The balloons policy exports containers running in each balloon,
# and cpusets of balloons. Accessible in command line:
# curl --silent http://localhost:8891/metrics
# curl --silent http://$localhost_or_pod_IP:8891/metrics
HTTPEndpoint: :8891
PrometheusExport: true
metrics:
enabled: # use '*' instead for all available metrics
- policy
logger:
Debug: policy
```
8 changes: 1 addition & 7 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ module github.com/containers/nri-plugins
go 1.22

require (
contrib.go.opencensus.io/exporter/prometheus v0.4.2
github.com/containerd/nri v0.6.0
github.com/containerd/otelttrpc v0.0.0-20240305015340-ea5083fda723
github.com/containerd/ttrpc v1.2.3-0.20231030150553-baadfd8e7956
Expand All @@ -17,10 +16,8 @@ require (
github.com/pelletier/go-toml/v2 v2.1.0
github.com/prometheus/client_golang v1.17.0
github.com/prometheus/client_model v0.5.0
github.com/prometheus/common v0.44.0
github.com/sirupsen/logrus v1.9.3
github.com/stretchr/testify v1.8.4
go.opencensus.io v0.24.0
go.opentelemetry.io/otel v1.19.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.19.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.19.0
Expand All @@ -44,8 +41,6 @@ require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/emicklei/go-restful/v3 v3.9.0 // indirect
github.com/evanphx/json-patch v5.6.0+incompatible // indirect
github.com/go-kit/log v0.2.1 // indirect
github.com/go-logfmt/logfmt v0.6.0 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-openapi/jsonpointer v0.19.6 // indirect
Expand All @@ -54,7 +49,6 @@ require (
github.com/go-task/slim-sprig v0.0.0-20230315185526-52ccab3ef572 // indirect
github.com/godbus/dbus/v5 v5.0.4 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/gnostic-models v0.6.8 // indirect
github.com/google/go-cmp v0.6.0 // indirect
Expand All @@ -73,8 +67,8 @@ require (
github.com/opencontainers/runtime-spec v1.1.0 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/common v0.44.0 // indirect
github.com/prometheus/procfs v0.12.0 // indirect
github.com/prometheus/statsd_exporter v0.24.0 // indirect
github.com/spf13/pflag v1.0.5 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.19.0 // indirect
go.opentelemetry.io/otel/metric v1.19.0 // indirect
Expand Down
Loading