Skip to content

chore: add test and docs for metrics #764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/modules/trino/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,38 @@

The managed Trino instances are automatically configured to export Prometheus metrics.
See xref:operators:monitoring.adoc[] for more details.

== Metrics

Trino automatically exposes built-in Prometheus metrics on coordinators and workers. The metrics are available on the `http` (`8080/metrics`) or
`https` (`8443/metrics`) port, depending on the TLS settings.

The following `ServiceMonitor` example, demonstrates how the metrics could be scraped using the https://prometheus-operator.dev/[Prometheus Operator].

[source,yaml]
----
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: scrape-label
spec:
endpoints:
- port: https # or http
scheme: https # or http
path: /metrics
basicAuth: # <1>
username:
name: trino-user-secret
key: username
password:
name: trino-user-secret
key: password
jobLabel: app.kubernetes.io/instance
namespaceSelector:
any: true
selector:
matchLabels:
prometheus.io/scrape: "true"
----

<1> Add user information if Trino is configuration to use authentication
85 changes: 85 additions & 0 deletions tests/templates/kuttl/commons/check-metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/usr/bin/env python3
import argparse
import requests
import time


def print_request_error_and_sleep(message, err, retry_count):
print("[" + str(retry_count) + "] " + message, err)
time.sleep(5)


def try_get(url):
retries = 3
for i in range(retries):
try:
if "coordinator" in url:
r = requests.get(
url,
timeout=5,
headers={"x-trino-user": "admin"},
auth=("admin", "admin"),
verify=False,
)
else:
r = requests.get(
url, timeout=5, headers={"x-trino-user": "admin"}, verify=False
)
r.raise_for_status()
return r
except requests.exceptions.HTTPError as errh:
print_request_error_and_sleep("Http Error: ", errh, i)
except requests.exceptions.ConnectionError as errc:
print_request_error_and_sleep("Error Connecting: ", errc, i)
except requests.exceptions.Timeout as errt:
print_request_error_and_sleep("Timeout Error: ", errt, i)
except requests.exceptions.RequestException as err:
print_request_error_and_sleep("Error: ", err, i)

exit(-1)


def check_monitoring(hosts):
for host in hosts:
# test for the jmx exporter metrics
url = "http://" + host + ":8081/metrics"
response = try_get(url)

if not response.ok:
print("Error for [" + url + "]: could not access monitoring")
exit(-1)

# test for the native metrics
url = "https://" + host + ":8443/metrics"
response = try_get(url)

if response.ok:
# arbitrary metric was chosen to test if metrics are present in the response
if "io_airlift_node_name_NodeInfo_StartTime" in response.text:
continue
else:
print("Error for [" + url + "]: missing metrics")
exit(-1)
else:
print("Error for [" + url + "]: could not access monitoring")
exit(-1)


if __name__ == "__main__":
all_args = argparse.ArgumentParser(description="Test Trino metrics.")
all_args.add_argument(
"-n", "--namespace", help="The namespace to run in", required=True
)
args = vars(all_args.parse_args())
namespace = args["namespace"]

host_coordinator_0 = f"trino-coordinator-default-0.trino-coordinator-default.{namespace}.svc.cluster.local"
host_worker_0 = (
f"trino-worker-default-0.trino-worker-default.{namespace}.svc.cluster.local"
)

hosts = [host_coordinator_0, host_worker_0]

check_monitoring(hosts)

print("Test check-metrics.py succeeded!")
1 change: 1 addition & 0 deletions tests/templates/kuttl/smoke/21-assert.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ commands:
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-active-workers.py -u admin -p admin -n $NAMESPACE -w 1
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-opa.py -n $NAMESPACE
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-s3.py -n $NAMESPACE
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-metrics.py -n $NAMESPACE
1 change: 1 addition & 0 deletions tests/templates/kuttl/smoke/21-copy-scripts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ commands:
- script: kubectl cp -n $NAMESPACE ./check-active-workers.py trino-test-helper-0:/tmp || true
- script: kubectl cp -n $NAMESPACE ./check-opa.py trino-test-helper-0:/tmp || true
- script: kubectl cp -n $NAMESPACE ./check-s3.py trino-test-helper-0:/tmp || true
- script: kubectl cp -n $NAMESPACE ../../../../templates/kuttl/commons/check-metrics.py trino-test-helper-0:/tmp || true
1 change: 1 addition & 0 deletions tests/templates/kuttl/smoke/31-assert.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ commands:
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-active-workers.py -u admin -p admin -n $NAMESPACE -w 2
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-opa.py -n $NAMESPACE
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-s3.py -n $NAMESPACE
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-metrics.py -n $NAMESPACE
1 change: 1 addition & 0 deletions tests/templates/kuttl/smoke_aws/21-assert.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ commands:
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-active-workers.py -u admin -p admin -n $NAMESPACE -w 1
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-opa.py -n $NAMESPACE
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-s3.py -n $NAMESPACE
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-metrics.py -n $NAMESPACE
1 change: 1 addition & 0 deletions tests/templates/kuttl/smoke_aws/21-copy-scripts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ commands:
- script: kubectl cp -n $NAMESPACE ./check-active-workers.py trino-test-helper-0:/tmp || true
- script: kubectl cp -n $NAMESPACE ./check-opa.py trino-test-helper-0:/tmp || true
- script: kubectl cp -n $NAMESPACE ./check-s3.py trino-test-helper-0:/tmp || true
- script: kubectl cp -n $NAMESPACE ../../../../templates/kuttl/commons/check-metrics.py trino-test-helper-0:/tmp || true