Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.idea
26 changes: 22 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Temporal Grafana Dashboards
# Temporal Dashboards

This repository contains community-driven [Grafana](https://grafana.com/docs/grafana/latest/dashboards/) and [DataDog](https://docs.datadoghq.com/dashboards/) dashboards that can be used for monitoring
[Temporal Cloud](https://temporal.io/cloud), [Temporal Server](https://github.com/temporalio/temporal), and [Temporal SDK](https://docs.temporal.io/develop) metrics.

This repository contains community-driven Grafana [dashboards](https://grafana.com/docs/grafana/latest/dashboards/) that can be used for monitoring
Temporal Server and SDK metrics.

We welcome contributions to existing as well as new dashboards that can help the community.

Expand All @@ -10,10 +11,27 @@ We welcome contributions to existing as well as new dashboards that can help the

## Directory structure

* [`server/`](server): Dashboards for Temporal Server metrics
* [`cloud/`](cloud): Dashboards for Temporal Cloud metrics.
* [`server/`](server): Dashboards for Temporal Server metrics.
* [`sdk/`](sdk): Dashboards for Temporal SDK metrics.
* [`misc/`](misc): Server metrics dashboards that have not been fully tested yet or need improvements

## Setup
* [Temporal Cloud](https://docs.temporal.io/cloud/metrics/)
* [Temporal Server](https://docs.temporal.io/self-hosted-guide/monitoring)
* _Temporal SDK_
* [Golang](https://docs.temporal.io/develop/go/observability)
* [Java](https://docs.temporal.io/develop/java/observability)
* [Python](https://docs.temporal.io/develop/python/observability)
* [TypeScript](https://docs.temporal.io/develop/typescript/observability)
* [.NET](https://docs.temporal.io/develop/dotnet/observability)
* [PHP](https://docs.temporal.io/develop/php/observability)

## Available metrics
* [Temporal Cloud metrics](https://docs.temporal.io/production-deployment/cloud/metrics/reference)
* [Temporal Server metrics](https://docs.temporal.io/references/cluster-metrics)
* [Temporal SDK metrics](https://docs.temporal.io/references/sdk-metrics)

## Usage

Our [default helm chart](https://github.com/temporalio/helm-charts) installs Grafana and will provision the dashboards from this repo automatically. If you would like to try these dashboards on your own Grafana instance you can import them. Unfortunately Grafana does not allow importing by URL aside from those hosted on the Grafana website, so the JSON of the dashboard needs to be copy/pasted into your Grafana instance. To do this:
Expand Down
13 changes: 13 additions & 0 deletions cloud/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Temporal Cloud Dashboards

## Setup
* [Temporal Cloud](https://docs.temporal.io/cloud/metrics/)

## Available metrics
* [Temporal Cloud metrics](https://docs.temporal.io/production-deployment/cloud/metrics/reference)

## Dashboards
* **Grafana** [here](temporal_cloud.json)
* **DataDog** integration details and Dashboard access are found [here](https://docs.datadoghq.com/integrations/temporal-cloud/).
* Related [Blog post](https://temporal.io/blog/temporal-cloud-metrics-in-datadog).

22 changes: 18 additions & 4 deletions sdk/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,24 @@
# Temporal Grafana SDK Dashboards
# Temporal SDK Dashboards

This repository contains the following dashboards:
## Setup

* [Golang](https://docs.temporal.io/develop/go/observability)
* [Java](https://docs.temporal.io/develop/java/observability)
* [Python](https://docs.temporal.io/develop/python/observability)
* [TypeScript](https://docs.temporal.io/develop/typescript/observability)
* [.NET](https://docs.temporal.io/develop/dotnet/observability)
* [PHP](https://docs.temporal.io/develop/php/observability)

## Available metrics
[Temporal SDK metrics](https://docs.temporal.io/references/sdk-metrics)

## Grafana dashboards
- [temporal-go-java-sdks-tally.json](temporal-go-java-sdks-tally.json) for [Go](https://github.com/temporalio/sdk-go) and [Java](https://github.com/temporalio/sdk-java) SDKs using Uber Tally to emits metrics.

- [temporal-go-sdk-otel.json](temporal-go-sdk-otel.json) for [Go](https://github.com/temporalio/sdk-go) SDK using OpenTelemetry to emit metrics.

- [temporal-core-sdks-otel.json](temporal-core-sdks-otel.json) for [Core](https://github.com/temporalio/sdk-core) based SDKs. In Core based SDKs, metrics of the type Histogram
are measured in milliseconds by default, so the dashboard is configured accordingly to display them in milliseconds.
- [temporal-core-sdks-otel.json](temporal-core-sdks-otel.json) for [Core](https://github.com/temporalio/sdk-core) based SDKs. In Core based SDKs, metrics of the type Histogram
are measured in milliseconds by default, so the dashboard is configured accordingly to display them in milliseconds.

## DataDog dashboards
DataDog dashboards and related configuration are [here](datadog).
22 changes: 22 additions & 0 deletions sdk/datadog/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Temporal DataDog SDK Dashboards

The [Dashboard](temporal_sdk_dashboard.json) here works with the DataDog collector and [this openmetrics configuration](openmetrics.h_conf.yaml).

### Prerequisites

1. Some means of pushing metrics to DataDog. Often this can be a [DataDog Agent](https://docs.datadoghq.com/getting_started/agent/) installed.
2. Advanced `Percentile` configuration for Distribution metrics.
1. Which metrics? Basically, any metrics [here](https://docs.temporal.io/references/sdk-metrics) that are a `Histogram` and for which you want to report percentiles (p95/p99).

### Put it all together
1. If using the DataDog Agent:
a. Visit [here](https://docs.datadoghq.com/integrations/openmetrics/) for OpenMetrics configuration with the DataDog agent for details about configuring your DD Agent.
b. Drop this [conf.yaml](openmetrics.h_conf.yaml) at your Agent `openmetrics.d` config path.
2. Import the [Dashboard](temporal_sdk_dashboard.json)
3. Be sure you enable the `Percentile` configuration for relevant metrics.

> **Example for configuring `temporal_request_latency`**:
> * Visit https://app.datadoghq.com/metric/summary?metric=temporal_request_latency
> * Set `Advanced > Percentiles > Configure > ON`


52 changes: 52 additions & 0 deletions sdk/datadog/openmetrics.h_conf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## OpenMetrics Configuration - CONVERT BUCKETS TO DISTRIBUTIONS

init_config:

instances:
- openmetrics_endpoint: "http://localhost:9464/metrics"
tags:
- "service:temporal-worker"
- "env:production"

metrics:
- "temporal_*"

# KEY: Convert histogram buckets to distributions
histogram_buckets_as_distributions: true # Convert to distributions
collect_histogram_buckets: true

# Send buckets as distributions instead of individual metrics
send_distribution_buckets: true # Send as distributions
send_distribution_counts_as_monotonic: true # For proper count handling
send_histograms_buckets: false

# Include all labels including "le" for bucket conversion
include_labels:
- "le" # NEEDED for bucket conversion
- "namespace"
- "operation"
- "workflow_type"
- "activity_type"
- "task_queue"
- "worker_type"
- "service_name"
- "error_type"
- "poller_type"
- "failure_reason"

exclude_labels:
- "job"
- "instance"
- "__name__"

timeout: 30
min_collection_interval: 15

# This will create distribution metrics that you can then enable percentiles for
# in the Datadog Metrics Summary page:
# temporal_request_latency (distribution)
# temporal_workflow_endtoend_latency (distribution)
# etc.
# Example: You want to enable the `p95` percentile for `temporal_request_latency`:
# 1. Visit https://app.datadoghq.com/metric/summary?metric=temporal_request_latency
# 2. Set `Advanced > Percentiles > Configure > ON`
Loading