diff --git a/contrib/dbx_ingestion_monitoring/CHANGELOG.txt b/contrib/dbx_ingestion_monitoring/CHANGELOG.txt new file mode 100644 index 0000000..dfde5c2 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/CHANGELOG.txt @@ -0,0 +1,25 @@ +DATABRICKS INGESTION MONITORING DABS +==================================== + +0.3.4 + +- Add support for pipeline discovery using pipeline tags +- Enhance AI/BI dashboards to support pipeline selection using tags + +0.3.3 + +- Add support for monitoring expectation check results + - Extend `table_events_metrics` with a new column `num_expectation_dropped_records` that contains the number of rows dropped by expectations + - Add a table `table_events_expectation_checks` which contains the number of rows that passed or failed specific expectation checks +- Update the generic SDP dashboard to expose metrics/visualizations about expectation failures. +- Bugfixes in the Datadog sink + +0.3.2 + +- All monitoring ETL pipelines are now configured to write their event logs to the monitoring schema so that the monitoring pipelines can also be monitored. For example, the CDC Monitoring ETL pipeline will write its event log into `{monitoring_catalog}.{monitoring_schema}.cdc_connector_monitoring_etl_event_log` and the Generic SDP monitoring ETL pipeline will write its event log into `{monitoring_catalog}.{monitoring_schema}.generic_sdp_monitoring_etl_event_log`. +- Added a fix for an issue that would cause the Monitoring ETL pipelines to periodically get stuck on `flow_targets` update. + + +0.3.1 + +- Fix an issue with pipelines execution time graph across DABs diff --git a/contrib/dbx_ingestion_monitoring/COMMON_CONFIGURATION.md b/contrib/dbx_ingestion_monitoring/COMMON_CONFIGURATION.md new file mode 100644 index 0000000..49022e6 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/COMMON_CONFIGURATION.md @@ -0,0 +1,137 @@ +# Common Configuration Guide + +This document describes common configuration parameters shared among monitoring DABs (Databricks Asset Bundles). + +Configuration is done through variables in a DAB deployment target. + + +## Required: Specify Monitoring Catalog and Schema + +Configure `monitoring_catalog` and `monitoring_schema` to specify where the monitoring tables will be created. The catalog must already exist, but the schema will be created automatically if it doesn't exist. + + +## Required: Specify Pipelines to Monitor + +Configuring which pipelines to monitor involves two steps: +1. Choose the method to extract pipeline event logs +2. Identify which pipelines to monitor + +### Event Log Extraction Methods + +There are two methods to extract a pipeline's event logs: + +**Ingesting (Preferred)** +- Extracts event logs directly from a Delta table where the pipeline writes its logs +- Available for pipelines configured with the `event_log` field ([see documentation](https://docs.databricks.com/api/workspace/pipelines/update#event_log)) +- Any UC-enabled pipeline using `catalog` and `schema` fields can be configured to store its event log in a Delta table +- Lower cost and better performance than importing + +**Importing (Alternative)** +- First imports the pipeline's event log into a Delta table, then extracts from there +- More expensive operation compared to ingesting +- Use only for UC pipelines that use the legacy `catalog`/`target` configuration style +- Requires configuring dedicated import jobs (see ["Optional: Configure Event Log Import Job(s)"](#optional-configure-event-log-import-jobs)) + +### Pipeline Identification Methods + +For both ingested and imported event logs, you can identify pipelines using: + +**1. Direct Pipeline IDs** +- Use `directly_monitored_pipeline_ids` for ingested event logs +- Use `imported_pipeline_ids` for imported event logs +- Format: Comma-separated list of pipeline IDs + +**2. Pipeline Tags** +- Use `directly_monitored_pipeline_tags` for ingested event logs +- Use `imported_pipeline_tags` for imported event logs +- Format: Semi-colon-separated lists of comma-separated `tag[:value]` pairs + - **Semicolons (`;`)** = OR logic - pipelines matching ANY list will be selected + - **Commas (`,`)** = AND logic - pipelines matching ALL tags in the list will be selected + - `tag` without a value is equivalent to `tag:` (empty value) + +**Example:** +``` +directly_monitored_pipeline_tags: "tier:T0;team:data,tier:T1" +``` +This selects pipelines with either: +- Tag `tier:T0`, OR +- Tags `team:data` AND `tier:T1` + +**Combining Methods:** +All pipeline identification methods can be used together. Pipelines matching any criteria will be included. + +> **Performance Tip:** For workspaces with hundreds or thousands of pipelines, enable pipeline tags indexing to significantly speed up tag-based discovery. See ["Optional: Configure Pipelines Tags Indexing Job"](#optional-configure-pipelines-tags-indexing-job) for more information. + + +## Optional: Monitoring ETL Pipeline Configuration + +**Schedule Configuration:** +- Customize the monitoring ETL pipeline schedule using the `monitoring_etl_cron_schedule` variable +- Default: Runs hourly +- Trade-off: Higher frequency increases data freshness but also increases DBU costs + +For additional configuration options, refer to the `variables` section in the `databricks.yml` file for the DAB containing the monitoring ETL pipeline. + + +## Optional: Configure Event Log Import Job(s) + +> **Note:** Only needed if you're using the "Importing" event log extraction method. + +**Basic Configuration:** +1. Set `import_event_log_schedule_state` to `UNPAUSED` + - Default schedule: Hourly (configurable via `import_event_log_cron_schedule`) + +2. Configure the `imported_event_log_tables` variable in the monitoring ETL pipeline + - Specify the table name(s) where imported logs are stored + - You can reference `${var.imported_event_logs_table_name}` + - Multiple tables can be specified as a comma-separated list + +**Handling Pipeline Ownership:** +- If monitored pipelines have a different owner than the DAB owner: + - Edit `resources/import_event_logs.job.yml` + - Uncomment the `run_as` principal lines + - Specify the appropriate principal + +- If multiple sets of pipelines have different owners: + - Duplicate the job definition in `resources/import_event_logs.job.yml` + - Give each job a unique name + - Configure the `run_as` principal for each job as needed + - All jobs can share the same target table (`imported_event_logs_table_name`) + +See [vars/import_event_logs.vars.yml](vars/import_event_logs.vars.yml) for detailed configuration variable descriptions. + + +## Optional: Configure Pipelines Tags Indexing Job + +> **When to use:** For large-scale deployments with hundreds or thousands of pipelines using tag-based identification. + +**Why indexing matters:** +Tag-based pipeline discovery requires fetching metadata for every pipeline via the Databricks API on each event log import and monitoring ETL execution. For large deployments, this can be slow and expensive. The tags index caches this information to significantly improve performance. + +**Configuration Steps:** + +1. **Enable the index:** + - Set `pipeline_tags_index_enabled` to `true` + +2. **Enable the index refresh job:** + - Set `pipeline_tags_index_schedule_state` to `UNPAUSED` + - This job periodically refreshes the index to keep it up-to-date + +3. **Optional: Customize refresh schedule** + - Configure `pipeline_tags_index_cron_schedule` (default: daily) + - If you change the schedule, consider adjusting `pipeline_tags_index_max_age_hours` (default: 48 hours) + - When the index is older than the max age threshold, the system falls back to API-based discovery + +See [vars/pipeline_tags_index.vars.yml](vars/pipeline_tags_index.vars.yml) for detailed configuration variable descriptions. + +> **Notes:** +1. The system gracefully falls back to API-based discovery if the index is disabled, unavailable, or stale. +2. If a recently created or tagged pipeline is missing from the monitoring ETL output, this can be due to the staleness of the index. Run the corresponding `Build *** pipeline tags index` job to refresh the index and re-run the monitoring ETL pipeline. + + +## Optional: Configure Third-Party Monitoring Integration + +You can export monitoring data to third-party monitoring platforms such as Datadog, Splunk, New Relic, or Azure Monitor. + +See [README-third-party-monitoring.md](README-third-party-monitoring.md) for detailed configuration instructions. + diff --git a/contrib/dbx_ingestion_monitoring/NOTICE b/contrib/dbx_ingestion_monitoring/NOTICE new file mode 100644 index 0000000..55405a3 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/NOTICE @@ -0,0 +1,17 @@ +Copyright (2025) Databricks, Inc. + +This Software includes software developed at Databricks (https://www.databricks.com/) and its use is subject to the included LICENSE file. + +__________ +This Software contains code from the following open source projects, licensed under the Apache 2.0 license (https://www.apache.org/licenses/LICENSE-2.0): + +requests - https://pypi.org/project/requests/ +Copyright 2019 Kenneth Reitz + +tenacity - https://pypi.org/project/tenacity/ +Copyright Julien Danjou + +pyspark - https://pypi.org/project/pyspark/ +Copyright 2014 and onwards The Apache Software Foundation. + + diff --git a/contrib/dbx_ingestion_monitoring/README-third-party-monitoring.md b/contrib/dbx_ingestion_monitoring/README-third-party-monitoring.md new file mode 100644 index 0000000..59fe241 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/README-third-party-monitoring.md @@ -0,0 +1,592 @@ +# Third-Party Monitoring Integration + +This document describes the configuration and architecture for integrating Databricks ingestion pipelines with +third-party observability platforms. + +## Table of Contents +- [Overview](#overview) +- [Telemetry Data](#telemetry-data) + - [Logs](#logs) + - [Metrics](#metrics) + - [Events](#events) +- [Setup](#setup) + - [Storing Secrets](#storing-secrets) + - [Datadog](#datadog) + - [New Relic](#new-relic) + - [Azure Monitor](#azure-monitor) + - [Splunk Observability Cloud](#splunk-observability-cloud) + - [Advanced Configuration](#advanced-configuration) + - [Endpoint Overrides](#endpoint-overrides) + - [HTTP Client Tuning](#http-client-tuning) +- [Important Considerations](#important-considerations) +- [Developer Docs](#developer-docs) + - [Code Architecture](#code-architecture) + - [Conversion Layer](#conversion-layer) + - [Part 1: Inference - Extract Telemetry Data](#part-1-inference---extract-telemetry-data) + - [Part 2: Destination Conversion - Transform to Platform Format](#part-2-destination-conversion---transform-to-platform-format) + - [Transmission Layer](#transmission-layer) + - [Flow](#flow) + - [Adding New Telemetry](#adding-new-telemetry) + +## Overview +Telemetry sinks for the monitoring ETL pipeline, enables real-time observability of +Databricks ingestion pipelines through structured streaming based ingestion to third-party +observability platforms. + +The sinks automatically capture and forward pipeline execution data including errors, +performance metrics, and status events to provide comprehensive monitoring and alerting +capabilities for data ingestion workflows. + +## Telemetry Data +The pipeline sends the following telemetry data to the observability platforms: + +### Logs +- Error logs capturing pipeline failures, exceptions, and error states +- Tagged with `pipeline_id`, `pipeline_name`, `pipeline_run_id`, `table_name`, `flow_name` +- Includes `pipeline_run_link` and detailed error information as attributes + +### Metrics +**Table Throughput Metrics:** +- `dlt.table.throughput.upserted_rows`: Count of upserted rows per table +- `dlt.table.throughput.deleted_rows`: Count of deleted rows per table +- `dlt.table.throughput.output_rows`: Total output rows per table + +**Pipeline Performance Metrics:** +- `pipeline.run.starting_seconds`: Time from queue to start +- `pipeline.run.waiting_for_resources_seconds`: Resource allocation wait time +- `pipeline.run.initialization_seconds`: Pipeline initialization duration +- `pipeline.run.running_seconds`: Actual execution time +- `pipeline.run.total_seconds`: End-to-end pipeline duration + +### Events +- Pipeline status change events (RUNNING, COMPLETED, FAILED, etc.) +- Includes `pipeline_link`, `pipeline_run_link`, completion status, timing, error messages + +## Setup + +Make sure Databricks has allowlisted the required outbound endpoints for the chosen observability platform. + +### Storing Secrets + +For API keys and other sensitive configuration, it is recommended to store them in Databricks secrets. +Instead of providing individual configuration keys in Spark config, create a Databricks secret scope containing all +required parameters and reference it using the 'secrets_scope' parameter. The secret key names in the scope must +exactly match the Spark configuration parameter names (e.g., secret key 'api_key' corresponds to Spark parameter +'api_key'). When both Spark config and secrets are provided, secrets will be automatically merged during execution and +will take precedence over Spark configuration values. + +See: https://docs.databricks.com/aws/en/security/secrets/ + +### Datadog + +**Configuration Parameters:** +- **destination** (str): Must be set to `datadog` to enable Datadog telemetry sink. +- **api_key** (str): Datadog API key for authentication. Must have permissions to send metrics, logs, and events. + Recommended to store in Databricks secrets for security. Please ensure the key has the necessary + scopes and permissions to ingest logs, metrics, and events. + Refer: https://docs.datadoghq.com/account_management/api-app-keys/ +- **host_name** (str): Datadog site for the account. Example: 'datadoghq.com', 'datadoghq.eu', 'us5.datadoghq.com', 'ap2.datadoghq.com'. + Refer: https://docs.datadoghq.com/api/latest/using-the-api/ + +**Setup Steps:** + +1. Create a Databricks secret scope and store the Datadog API key: +```bash +databricks secrets create-scope datadog-secrets +databricks secrets put-secret datadog-secrets api_key --string-value "" +``` + +2. Update the pipeline configuration file `resources/monitoring_etl.pipeline.yml` to reference the variables: +```yaml + resources: + pipelines: + cdc_connector_monitoring_etl: + name: "Monitoring ETL for CDC Connector Pipelines" + libraries: + - glob: + include: ../monitoring_etl/** + - glob: + include: ../../third_party_sinks/** + serverless: ${var.serverless_monitoring_pipeline_enabled} + development: true + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + root_path: ${workspace.file_path}/cdc_connector_monitoring_dab/monitoring_etl + configuration: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + directly_monitored_pipeline_ids: ${var.directly_monitored_pipeline_ids} + imported_event_log_tables: ${var.imported_event_log_tables} + + # Third-party monitoring configuration + destination: ${var.third_party_destination} + secrets_scope: ${var.third_party_secrets_scope} + host_name: ${var.third_party_host_name} +``` + +3. Define third-party monitoring variables in `databricks.yml`: +```yaml +targets: + dev: + default: true + mode: development + variables: + # ... other variables ... + + # Third-party monitoring configuration. Replace with actual values. + third_party_destination: "datadog" + third_party_host_name: "us5.datadoghq.com" + third_party_secrets_scope: "datadog-secrets" +``` +### New Relic + +**Configuration Parameters:** +- **destination** (str): Must be set to `newrelic` to enable New Relic telemetry sink. +- **api_key** (str): New Relic API key for authentication. Must have permissions to send metrics, logs, and events. + Recommended to store in Databricks secrets for security. + Refer: https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/ +- **host_name** (str): New Relic site for the account. Example: 'newrelic.com', 'eu.newrelic.com'. + Refer: https://docs.newrelic.com/docs/apis/ingest-apis/introduction-new-relic-ingest-apis/ +- **account_id** (str): New Relic account ID. Required for auto-generating events endpoint. Can be found in the New Relic UI under Account Settings. + Refer: https://docs.newrelic.com/docs/accounts/accounts-billing/account-structure/account-id/ + +**Setup Steps:** + +1. Create a Databricks secret scope and store the New Relic API key: +```bash +databricks secrets create-scope newrelic-secrets +databricks secrets put-secret newrelic-secrets api_key --string-value "" +``` + +2. Update the pipeline configuration file `resources/monitoring_etl.pipeline.yml` to reference the variables: +```yaml + resources: + pipelines: + cdc_connector_monitoring_etl: + name: "Monitoring ETL for CDC Connector Pipelines" + libraries: + - glob: + include: ../monitoring_etl/** + - glob: + include: ../../third_party_sinks/** + serverless: ${var.serverless_monitoring_pipeline_enabled} + development: true + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + root_path: ${workspace.file_path}/cdc_connector_monitoring_dab/monitoring_etl + configuration: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + directly_monitored_pipeline_ids: ${var.directly_monitored_pipeline_ids} + imported_event_log_tables: ${var.imported_event_log_tables} + + # Third-party monitoring configuration + destination: ${var.third_party_destination} + secrets_scope: ${var.third_party_secrets_scope} + host_name: ${var.third_party_host_name} + account_id: ${var.third_party_account_id} +``` + +3. Define third-party monitoring variables in `databricks.yml`: +```yaml +targets: + dev: + default: true + mode: development + variables: + # ... other variables ... + + # Third-party monitoring configuration. Replace with actual values. + third_party_destination: "newrelic" + third_party_host_name: "newrelic.com" + third_party_secrets_scope: "newrelic-secrets" + third_party_account_id: "" +``` + +### Azure Monitor + +Supports sending telemetry to Azure Monitor using the Data Collection Rule (DCR) API. All telemetry data will be sent +as logs to custom tables in a Log Analytics Workspace. + +**Configuration Parameters:** +- **destination**: Must be set to `azuremonitor` to enable Azure Monitor telemetry sink. +- **host_name**: Data Collection Endpoint (DCE) hostname for the Azure Monitor workspace. + Format: `..ingest.monitor.azure.com` + Example: `my-dce-abc123.eastus-1.ingest.monitor.azure.com` + This is used to auto-generate the DCE endpoint URL. +- **azure_client_id**: Azure service principal client ID for authentication. + Recommended to store in Databricks secrets for security. +- **azure_client_secret**: Azure service principal client secret for authentication. + Recommended to store in Databricks secrets for security. +- **azure_dcr_immutable_id**: Immutable ID of the Data Collection Rule containing all three telemetry streams (metrics, logs, events). + Example: `dcr-1234567890abcdef1234567890abcdef` +- **azure_tenant_id**: Azure AD tenant ID. Required for authorization endpoint. + Example: `12345678-1234-1234-1234-123456789012` + +**Azure Infrastructure Setup:** + +Before we begin exporting telemetry to Azure Monitor, we need to set up the following Azure resources: +- **Service Principal**: Provides authentication credentials for secure API access to Azure Monitor +- **Log Analytics Workspace**: Central data repository where telemetry data is stored and queried +- **Data Collection Endpoint (DCE)**: Regional/Global ingestion endpoint for receiving telemetry data +- **Data Collection Rule (DCR)**: Defines data transformation rules and routing to destination tables + +The data will be stored in the following custom tables: +- **Metrics**: `DatabricksMetrics_CL` - Pipeline performance metrics and table throughput data +- **Logs**: `DatabricksLogs_CL` - Error logs and failure information +- **Events**: `DatabricksEvents_CL` - Pipeline status change events + +**Setup Steps:** + +1. **[Optional] Create a resource group** (if you don't have one): +```bash +az group create --name databricks-monitoring-rg --location "East US" +``` + +2. **[Optional] Create a Log Analytics Workspace** (if you don't have one): +```bash +az monitor log-analytics workspace create \ + --resource-group databricks-monitoring-rg \ + --workspace-name databricks-monitoring-workspace \ + --location "East US" +``` + +3. **[Optional] Create a service principal and grant it access to the workspace** (if you don't have one): +```bash +# Create service principal +az ad sp create-for-rbac --name databricks-monitoring-sp --skip-assignment + +# Assign Log Analytics Contributor role to the service principal +az role assignment create \ + --assignee "" \ + --role "Log Analytics Contributor" \ + --scope "" +``` +Note down the `appId` (client ID), `password` (client secret), and `tenant` values from the service principal creation output. + +4. **Run the setup script** to create Data Collection Endpoint (DCE), custom tables, and Data Collection Rule (DCR): +```bash +./third_party_sinks/azure_setup.sh \ + --resource-group databricks-monitoring-rg \ + --location "" +``` +The script creates: +- Public data collection endpoint: `databricks-monitoring-dce` +- Custom tables: `DatabricksMetrics_CL`, `DatabricksLogs_CL`, `DatabricksEvents_CL` +- Data collection rule: `databricks-monitoring-dcr` (configured for all three telemetry streams) + +The script outputs the following values needed for Databricks configuration: +- **Azure Host Name** (DCE hostname) +- **Azure DCR Immutable ID** + +For more customization, see `azure_setup.sh`. + + +**Pipeline Configuration:** + +1. Create a Databricks secret scope and store Azure credentials: +```bash +databricks secrets create-scope azuremonitor-secrets +databricks secrets put-secret azuremonitor-secrets azure_client_id --string-value "" +databricks secrets put-secret azuremonitor-secrets azure_client_secret --string-value "" +``` + +2. Update the pipeline configuration file `resources/monitoring_etl.pipeline.yml` to reference the variables: +```yaml + resources: + pipelines: + cdc_connector_monitoring_etl: + name: "Monitoring ETL for CDC Connector Pipelines" + libraries: + - glob: + include: ../monitoring_etl/** + - glob: + include: ../../third_party_sinks/** + serverless: ${var.serverless_monitoring_pipeline_enabled} + development: true + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + root_path: ${workspace.file_path}/cdc_connector_monitoring_dab/monitoring_etl + configuration: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + directly_monitored_pipeline_ids: ${var.directly_monitored_pipeline_ids} + imported_event_log_tables: ${var.imported_event_log_tables} + + # Third-party monitoring configuration + destination: ${var.third_party_destination} + secrets_scope: ${var.third_party_secrets_scope} + azure_tenant_id: ${var.azure_tenant_id} + host_name: ${var.third_party_host_name} + azure_dcr_immutable_id: ${var.azure_dcr_immutable_id} +``` + +3. Define third-party monitoring variables in `databricks.yml`: +```yaml +targets: + dev: + default: true + mode: development + variables: + # ... other variables ... + + # Third-party monitoring configuration. Replace with actual values. + third_party_destination: "azuremonitor" + third_party_host_name: "my-dce-abc123.eastus-1.ingest.monitor.azure.com" + third_party_secrets_scope: "azuremonitor-secrets" + azure_tenant_id: "" + azure_dcr_immutable_id: "dcr-1234567890abcdef1234567890abcdef" +``` + +**Important Notes:** +- The OAuth2 access tokens are automatically refreshed when tokens get older than `azure_max_access_token_staleness` (default: 3300 seconds). Adjust this parameter if needed. +- If additional scope/parameters are required for fetching oauth2 tokens, please update the payload in dbx_ingestion_monitoring_pkg/third_party_sinks/azuremonitor_sink.py fetch_access_token() function. + +### Splunk Observability Cloud + +Supports sending telemetry to Splunk Observability Cloud (formerly SignalFx) using the SignalFx Ingest API. Since, +Splunk Observability Cloud does not support native log ingestion, logs are sent as events. + +**Configuration Parameters:** +- **destination**: Must be set to `splunk_observability` to enable Splunk Observability Cloud telemetry sink. +- **host_name**: Splunk Observability Cloud ingest hostname for your realm. + Format: `ingest..signalfx.com` + Example: `ingest.us0.signalfx.com`, `ingest.eu0.signalfx.com` + Refer: https://dev.splunk.com/observability/docs/apibasics/api_list/ +- **splunk_access_token**: Splunk Observability Cloud organization access token for metrics, events, and logs. + Recommended to store in Databricks secrets for security. + Refer: https://help.splunk.com/en/splunk-observability-cloud/administer/authentication-and-security/authentication-tokens/org-access-tokens + +**Setup Steps:** + +1. Create a Databricks secret scope and store Splunk credentials: +```bash +databricks secrets create-scope splunk-secrets +databricks secrets put-secret splunk-secrets splunk_access_token --string-value "" +``` + +2. Update the pipeline configuration file `resources/monitoring_etl.pipeline.yml` to reference the variables: + +```yaml + resources: + pipelines: + cdc_connector_monitoring_etl: + name: "Monitoring ETL for CDC Connector Pipelines" + libraries: + - glob: + include: ../monitoring_etl/** + - glob: + include: ../../third_party_sinks/** + serverless: ${var.serverless_monitoring_pipeline_enabled} + development: true + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + root_path: ${workspace.file_path}/cdc_connector_monitoring_dab/monitoring_etl + configuration: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + directly_monitored_pipeline_ids: ${var.directly_monitored_pipeline_ids} + imported_event_log_tables: ${var.imported_event_log_tables} + + # Third-party monitoring configuration + destination: ${var.third_party_destination} + host_name: ${var.third_party_host_name} + secrets_scope: ${var.third_party_secrets_scope} +``` + +3. Define third-party monitoring variables in `databricks.yml`: +```yaml +targets: + dev: + default: true + mode: development + variables: + # ... other variables ... + + # Third-party monitoring configuration. Replace with actual values. + third_party_destination: "splunk_observability" + third_party_host_name: "ingest.us0.signalfx.com" + third_party_secrets_scope: "splunk-secrets" +``` + +**Important Notes:** +- All telemetry (metrics, events, and logs) is sent to SignalFx APIs (Splunk Observability Cloud) +- Logs are sent as events to eliminate the need for HEC (HTTP Event Collector) setup +- Only a single access token is required for all telemetry types + +### Advanced Configuration + +The following optional parameters provide fine-grained control over the pipeline: + +#### Endpoint Overrides +By default, API endpoints are automatically constructed from the `host_name` parameter using platform-specific URL patterns. These parameters allow explicit endpoint override for regionalization, compliance, or proxy routing requirements.: + +- **endpoints.metrics** (str): Full URL for the metrics ingestion API endpoint. Overrides the auto-generated endpoint based on `host_name`. +- **endpoints.logs** (str): Full URL for the logs ingestion API endpoint. Overrides the auto-generated endpoint based on `host_name`. +- **endpoints.events** (str): Full URL for the events ingestion API endpoint. Overrides the auto-generated endpoint based on `host_name`. + +#### HTTP Client Tuning + +- **num_rows_per_batch** (int): Controls batching granularity for HTTP requests. Determines the number of telemetry records aggregated before transmission. Lower values reduce memory pressure and payload size but increase request frequency. Higher values improve throughput but risk exceeding API payload limits. + - Default: 100 + - Note: Some observability platforms accept only a single event per API call. In such cases, this parameter is ignored. + +- **max_retry_duration_sec** (int): Maximum time window for exponential backoff retry logic when HTTP requests fail due to transient errors (network issues, rate limiting, temporary service unavailability). The HTTP client will retry failed requests with exponentially increasing delays until this duration is exceeded. + - Default: 300 seconds (5 minutes) + +- **request_timeout_sec** (int): Socket-level timeout for individual HTTP POST requests. Applies to connection establishment, request transmission, and response reception. Does not include retry delays. + - Default: 30 seconds + + +## Important Considerations + +1. **Payload Size**: The code does not split large payloads. The `num_rows_per_batch` parameter should be set small to keep the overall payload size within acceptable limits. + +2. **Data Staleness**: APIs might enforce a staleness limit on incoming data. The current code does not enforce such a +threshold, so the pipeline schedule must be configured to ensure data freshness. If stale data is rejected by the API, the pipeline will retry the request and eventually drop the call if it continues to fail. + +3. **Formatting**: The code trims strings which are longer than the maximum allowed length as per the schema. Any date time is sent as ISO 8601 formatted string. + + +# Developer Docs + +## Code Architecture + +The sink implementations contains the following main components: + +- **Streaming Source Tables**: Read from `event_logs_bronze` and `pipeline_runs_status` tables +- **Conversion Layer**: Two-part transformation process + - **Part 1 - Inference**: Extract telemetry data from source rows (logs, events, metrics) + - **Part 2 - Destination Conversion**: Transform to platform-specific JSON format with schema validation and batch into HTTP request DataFrames +- **Transmission Layer**: HTTP client with gzip compression, connection pooling, exponential backoff retry, and failure handling +- **Observability Platform API**: Final delivery to Datadog or New Relic endpoints + +### Conversion Layer +**Purpose:** Extracts telemetry information from streaming table rows, transforms them into platform-specific payload formats, and batches them for HTTP delivery. + +The conversion layer has two distinct parts: + +#### Part 1: Inference - Extract Telemetry Data +The inference part reads streaming table rows and extracts meaningful telemetry data from them. These `convert_row_to_*` functions understand the structure of the source tables (event_logs_bronze, pipeline_runs_status) and extract relevant fields: + +- `convert_row_to_error_log(row)`: Extracts error information from event logs and converts to log format +- `convert_row_to_pipeline_status_event(row)`: Extracts pipeline state changes and converts to event format +- `convert_row_to_pipeline_metrics(row)`: Calculates pipeline execution timing metrics from timestamps +- `convert_row_to_table_metrics(row)`: Extracts table-level throughput metrics from flow progress data + +#### Part 2: Destination Conversion - Transform to Platform Format +The destination conversion part takes the extracted telemetry data and converts it into platform-specific JSON payloads that conform to each observability platform's API specifications. These converter classes handle the differences between destination formats: + +- `MetricsConverter`: + - `create_metric(metric_name, metric_value, tags, timestamp, additional_attributes)`: Converts extracted metric data into destination-specific JSON format with schema validation + - `create_http_requests_spec(df, num_rows_per_batch, headers, endpoint)`: Batches metrics into HTTP request DataFrame + +- `EventsConverter`: + - `create_event(title, status, tags, timestamp, additional_attributes)`: Converts extracted event data into destination-specific JSON format with schema validation + - `create_http_requests_spec(df, num_rows_per_batch, headers, endpoint)`: Batches events into HTTP request DataFrame + +- `LogsConverter`: + - `create_log(title, status, tags, timestamp, additional_attributes)`: Converts extracted log data into destination-specific JSON format with schema validation + - `create_http_requests_spec(df, num_rows_per_batch, headers, endpoint)`: Batches logs into HTTP request DataFrame + +**Schema Validation:** +All telemetry payloads are validated against predefined JSON schemas (METRICS_SCHEMA, LOGS_SCHEMA, EVENTS_SCHEMA) before transmission. The `enforce_schema()` function: +- Validates field types and required properties +- Trims strings exceeding maxLength constraints +- Converts datetime objects to appropriate formats (ISO 8601 strings or Unix timestamps) +- Note: Schema validation failures will cause the micro-batch to fail, ensuring data quality + +### Transmission Layer +**Purpose:** Provides reliable HTTP delivery with retry mechanism and connection pooling. + +**HTTPClient Class:** +- `post(http_request_specs_df)`: Sends HTTP POST requests for each row in the DataFrame + - Input: DataFrame with columns: endpoint (str), header (JSON str), payloadBytes (binary) + - Compresses payloads using gzip before transmission + - Uses persistent session for connection pooling + - Implements exponential backoff retry logic via tenacity library + - Continues processing on failure (logs error and moves to next request) + +**Retry Configuration:** +- `max_retry_duration_sec`: Maximum time window for retries (default: 300s) +- `request_timeout_sec`: Individual request timeout (default: 30s) +- Exponential backoff: multiplier=1, min=1s, max=10s + +## Flow + +1. **Configuration Validation** (at module load time): + - Checks if `destination` parameter is set to "datadog" or "newrelic" + - Calls `getThirdPartySinkConfigFromSparkConfig()` to build configuration: + - Extracts required parameters: destination, api_key, host_name + - Merges secrets from Databricks secret scope if provided + - Auto-generates endpoints from host_name if not explicitly configured + - Sets default values for optional parameters + +2. **Global Initialization**: + - `initialize_global_config()`: Stores validated config in `_global_config` + - Instantiates converter objects: `_log_converter`, `_events_converter`, `_metrics_converter` + +3. **Sink Registration**: + - `register_sink_for_errors()`: Monitors event_logs_bronze for error logs + - `register_sink_for_pipeline_events()`: Monitors pipeline_runs_status for state changes + - `register_sink_for_pipeline_metrics()`: Monitors pipeline_runs_status for execution timing + - `register_sink_for_table_metrics()`: Monitors event_logs_bronze for flow_progress metrics + +Each sink registration creates: +- A `@dlt.foreach_batch_sink` function that processes each micro-batch +- A `@dlt.append_flow` function that defines the streaming source query + +## Adding New Telemetry + +Assume you have a streaming source table/view (`custom_log_source_table` or `custom_log_source_view`) with: +- `my_log_message`: The log message content +- `my_log_tags`: Additional tags (common tags: `pipeline_id`, `pipeline_name`, `pipeline_run_id`, `table_name`) +- `event_timestamp`: The log timestamp + +To send the above data to third party observability platforms, follow these steps: + +1. **Create a converter function** in the Inference Layer: +```python +def convert_row_to_custom_telemetry(row): + params = { + "title": getattr(row, "my_log_message", ""), + "status": "info", + "tags": {"custom_tag": getattr(row, "my_log_tags", "")}, # Add more tags as needed + "timestamp": timestamp_in_unix_milliseconds(row.event_timestamp), + "additional_attributes": {} # add any additional attributes if needed + } + return _log_converter.create_log(**params) +``` +This function extracts the relevant fields and uses the converter to create a log/metric/event in the destination format. + +2. **Create a sink registration function**: +```python +def register_sink_for_custom_telemetry(): + @dlt.foreach_batch_sink(name="send_custom_to_3p_monitoring") + def send_custom_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_custom_telemetry, StringType()) + telemetry_df = batch_df.withColumn("logs", destination_format_udf(struct("*"))).select("logs").filter(col("logs").isNotNull()) + http_request_spec = _log_converter.create_http_requests_spec( + telemetry_df, + _global_config["num_rows_per_batch"], + get_header(_global_config["api_key"]), # Use the appropriate header function + _global_config["endpoints"]["logs"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_custom_to_3p_monitoring") + def send_custom_to_sink(): + return spark.sql("SELECT * FROM STREAM(`custom_log_source_table`)") +``` + +This function converts the micro-batch rows to the destination format and sends them using the HTTP client. + +3. **Register the sink** in the initialization section: +```python +if spark.conf.get("destination", None) == "datadog": + initialize_global_config(spark.conf) + register_sink_for_custom_telemetry() # Add this line + register_sink_for_errors() + # ... other registrations +``` diff --git a/contrib/dbx_ingestion_monitoring/README.md b/contrib/dbx_ingestion_monitoring/README.md new file mode 100644 index 0000000..bc3c226 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/README.md @@ -0,0 +1,48 @@ +# Overview + +This package contains common code and DABs to deploy observability ETL and dashboards for Databricks ingestion projects. The goal is to provide an example and a starting point for building ingestion observability across pipelines and datasets. + +In particular, the package provides: + - Tools to ETL observability data from a variety of sources such as SDP event log, Auto Loader `cloud_file_states`, system tables and other. + - Tag-based pipeline discovery: Specify pipelines to monitor using flexible tag expressions with OR-of-ANDs logic (e.g., `"tier:T0;team:data,tier:T1"`) instead of maintaining lists of pipeline IDs + - Performance-optimized pipeline discovery: Optional inverted index for efficient tag-based pipeline discovery at scale + - Build a collection of observability tables on top of the above data using the medallion architecture. + - Provide out-of-the-box AI/BI Dashboards based on the above observability tables + - Code and examples to integrate the observability tables with third-party monitoring providers such as Datadog, New Relic, Azure Monitor, Splunk + +The package contains deployable [Databricks Asset Bundles (DABs)](https://docs.databricks.com/aws/en/dev-tools/bundles/) for easy distribution: + +- Generic SDP pipelines +- CDC Connector + +Coming soon + +- SDP pipelines with Auto Loader +- SaaS Connectors + +# Prerequisites + +- [Databricks Asset Bundles (DABs)](https://docs.databricks.com/aws/en/dev-tools/bundles/) +- PrPr for forEachBatch sinks in SDP (if using the 3P observabitlity platforms integration) + + +# Artifacts + +- Generic SDP DAB (in `generic_sdp_monitoring_dab/`). See [generic_sdp_monitoring_dab/README.md](generic_sdp_monitoring_dab/README.md) for more info. +- CDC Connectors Monitoring DAB (in `cdc_connector_monitoring_dab/`) See [cdc_connector_monitoring_dab/README.md](cdc_connector_monitoring_dab/README.md) for more info. + + +# Developer information + +## Shared top-level directories + +- `jobs/` - Shared notebooks to be used in jobs in the individual monitoring DABs. +- `lib/` - Shared python code. +- `resources/` - Shared DAB resources definitions +- `scripts/` - helper scripts +- `README-third-party-monitoring.md` and `third_party_sinks/` - 3P observability integration (Datadog, New Relic Splunk, Azure Monitor) +- `vars/` - DAB variables for the shared resources customization + + + + diff --git a/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/README.md b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/README.md new file mode 100644 index 0000000..9112778 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/README.md @@ -0,0 +1,48 @@ +# Getting started + +1. Configure the monitoring ETL + +Refer to [../COMMON_CONFIGURATION.md](COMMON_CONFIGURATION.md) for common configuration options. + +Additionally, for this DAB, you need to include the following as part of your deployment target configuration: + +``` +main_dashboard_template_path: ../cdc_connector_monitoring_dab/dashboards/CDC Connector Monitoring Dashboard Template.lvdash.json +``` + +The above will include the standard AI/BI dashboard as part of the deployment. If you have created your custom dashboard, you can replace the path to it in the above option. + +2. Deploy the DAB +3. Generate monitoring tables - recommended to do that manually the first time: + 1. Run the `Import CDC Connector event logs` job if you have configured `imported_pipeline_ids` variable + 2. Run the `Scheduled runner for Monitoring ETL ... ` job +4. On successful Step 3, run the `Post-deploy actions ...` job. This will create a sample monitoring Dashboard and also annotate the monitoring tables with column comments for easier use and exploration. + +# Architecture + +``` + |--------------| + | Import event | + |->| logs Job |--| + | |--------------| | +|-------------| | | +| Pipeline | | | ------------- +| without | | |-->( Imported )--| +| Delta event | | | ( Events ) | |-----------------| +| log |--| | ( Delta Tables) | |-->| 3P Observability| +|-------------| | | ------------- | | | Platforms sinks | + | |--------------| | | |-------------| | |-----------------| + | | Import Event |--| |-->| Monitoring |--| + |->| logs Job | | | ETL Pipeline| | |------------| |-----------| + |--------------| | |-------------| |-->| Monitoring |-->| Example | + | | tables | | AI/BI | +------------------- -------------- | |------------| | Dashboard | +| Pipelines with |-------------------> ( Event Log )---| |-----------| +| Delta event log | ( Delta Tables ) +------------------ -------------- +``` + +# Monitoring tables + +See the table and column comments in the target `monitoring_catalog`.`monitoring_schema`. Remember to run the `Post-deploy actions ...` job that populates those comments. + diff --git a/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/dashboards/CDC Connector Monitoring Dashboard Template.lvdash.json b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/dashboards/CDC Connector Monitoring Dashboard Template.lvdash.json new file mode 100644 index 0000000..a97c7a9 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/dashboards/CDC Connector Monitoring Dashboard Template.lvdash.json @@ -0,0 +1,7424 @@ +{ + "datasets": [ + { + "name": "82c4e3b5", + "displayName": "Pipelines Runs Errors", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " ifnull(table_name, '') as `Target Table`,\n", + " create_time as `Pipeline Run Start Time`,\n", + " event_timestamp as `Error Time`,\n", + " error_log_message as `Error Message in Log`,\n", + " error_message as `Error Message`,\n", + " (CASE WHEN error_code is NULL THEN '' WHEN error_code='' THEN '' ELSE error_code END) as `Error Code`,\n", + " error_full as `Error Details`\n", + "FROM events_errors AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC \n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "c91619f8", + "displayName": "Pipelines Metrics Freshness", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`, \n", + " latest_event_timestamp as `Observability data freshness`,\n", + " current_timestamp() as `Check time`,\n", + " (current_timestamp() - latest_event_timestamp) as `Latency at check time`\n", + "FROM (SELECT pipeline_id,\n", + " pipeline_link,\n", + " max(event_timestamp) AS latest_event_timestamp\n", + " FROM event_logs_bronze LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + " GROUP BY 1,2)\n", + "ORDER BY latest_event_timestamp" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "2660d018", + "displayName": "Table Changes", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " event_timestamp as `Event Time`,\n", + " ifnull(num_written_rows, 0) as `Applied Changes`\n", + "FROM events_table_metrics AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND (flow_type = :flow_type)\n", + " AND (num_written_rows IS NOT NULL)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "flow_type", + "keyword": "flow_type", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "cdc" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "02dde459", + "displayName": "Pipeline Tags", + "queryLines": [ + "SELECT '' as tag_value\n", + "UNION ALL\n", + "SELECT DISTINCT tag_value\n", + "FROM (SELECT explode(tags_array) as tag_value\n", + " FROM monitored_pipelines)" + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "6740610b", + "displayName": "Pipeline Ids", + "queryLines": [ + "SELECT pipeline_id\n", + "FROM monitored_pipelines\n", + "WHERE (pipeline_type='ingestion' or pipeline_type='gateway')\n", + " AND ((:tag_value = '')\n", + " OR (:tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :tag_value)))" + ], + "parameters": [ + { + "displayName": "tag_value", + "keyword": "tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "95ce6f6c", + "displayName": "Pipeline Loaded CDC Logs", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " event_timestamp as `Event Time`,\n", + " ifnull(num_written_rows, 0) as `Loaded Change Events`,\n", + " (num_written_rows IS NOT NULL) as `Has CDC Changes`\n", + "FROM events_table_metrics AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND flow_type like 'cdc_staging'\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "b18cca98", + "displayName": "Table Latencies Outliers", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " event_timestamp as `Event Time`,\n", + " min_event_time as `Oldest Event Time`,\n", + " timestampdiff(SECOND, min_event_time, event_timestamp) AS `Oldest Event Latency`,\n", + " max_event_time as `Newest Event Time`,\n", + " timestampdiff(SECOND, max_event_time, event_timestamp) AS `Newest Event Latency`\n", + " \n", + "FROM events_table_metrics AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND (flow_type = 'cdc')\n", + " AND (max_event_time IS NOT NULL)\n", + " AND timestampdiff(SECOND, min_event_time, event_timestamp) > 2 * timestampdiff(SECOND, max_event_time, event_timestamp)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "8897fc81", + "displayName": "Table Warnings", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " create_time as `Pipeline Start Time`,\n", + " event_timestamp as `Warning Time`,\n", + " warning_log_message as `Warning Log Message`\n", + "FROM events_warnings AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "2faba057", + "displayName": "Pipelines Hourly Error Rate", + "queryLines": [ + "SELECT pipeline_id as `Pipeline Id`,\n", + " pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " hour as `Hour`,\n", + " num_errors as `Number of Errors`\n", + "FROM metric_pipeline_error_rate AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (hour >= :event_range.min AND hour <= :event_range.max)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "5e3eb18f", + "displayName": "Event Times", + "queryLines": [ + "SELECT event_timestamp\n", + "FROM event_logs_bronze\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "248b74c6", + "displayName": "Table Status", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " table_name as `Target Table`,\n", + " latest_pipeline_run_link as `Latest Run`,\n", + " latest_state_with_color as `Latest State`,\n", + " latest_table_schema as `Latest Schema`,\n", + " current_timestamp() as `Last Data Refresh Time`,\n", + " latest_cdc_changes_time as `Latest CDC Changes Time`,\n", + " timestampdiff(SECOND, latest_cdc_changes_time, current_timestamp()) as `Seconds Since Latest CDC Change`,\n", + " latest_snapshot_changes_time as `Latest Snapshot Changes Time`,\n", + " latest_error_time as `Time of Latest Error`,\n", + " latest_error_code as `Code of Latest Error`,\n", + " latest_error_flow_type as `Flow Type of Latest Error`,\n", + " latest_error_log_message as `Log Message of Latest Error`,\n", + " latest_error_message as `Message of Latest Error`,\n", + " latest_error_pipeline_run_link as `Latest Pipeline Run with Error`,\n", + " updated_at as `Last Updated`,\n", + " timestampdiff(SECOND, updated_at, current_timestamp()) as `Seconds Since Latest Status Update`\n", + "FROM table_status AS ts\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (updated_at >= :event_range.min AND updated_at <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY latest_state_level DESC,\n", + " updated_at ASC\n" + ], + "parameters": [ + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "0ac0bb93", + "displayName": "Selected Pipelines", + "queryLines": [ + "SELECT pipeline_id as `Pipeline Id`,\n", + " pipeline_link as `Pipeline Name`,\n", + " pipeline_type as `Pipeline Type`,\n", + " tags_map as `Pipeline Tags`\n", + "FROM monitored_pipelines\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "7103e200", + "displayName": "Table Status Per Pipeline Run", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " table_name as `Target Table`,\n", + " ts.pipeline_run_link as `Pipeline Run`,\n", + " ts.latest_state_with_color as `Table State`,\n", + " ifnull(ts.latest_error_code, '') as `Error Code`,\n", + " ts.latest_error_time as `Error Time`,\n", + " ifnull(ts.latest_error_message, '') as `Error Message`,\n", + " ifnull(ts.latest_error_log_message, '') as `Error Message in Log`,\n", + " table_schema as `Latest Schema`,\n", + " create_time as `Pipeline Run Create Time`,\n", + " end_time as `Pipeline Run End Time`,\n", + " prs.latest_state_with_color as `Pipeline Run State`,\n", + " ts.updated_at as `Last Updated`\n", + "FROM table_status_per_pipeline_run AS ts\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status AS prs USING (pipeline_id, pipeline_run_id) \n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (ts.updated_at >= :event_range.min AND ts.updated_at <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY create_time DESC,\n", + " ts.updated_at DESC\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "c0688ffc", + "displayName": "Pipelines Runs Warnings", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ew.pipeline_run_link as `Pipeline Run`,\n", + " create_time as `Pipeline Run Start Time`,\n", + " event_timestamp as `Warning Time`,\n", + " warning_log_message as `Warning Message`\n", + "FROM events_warnings AS ew\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "8db826d7", + "displayName": "Table Names", + "queryLines": [ + "SELECT DISTINCT table_name\n", + "FROM monitored_tables LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (table_name is not NULL)\n", + " AND (table_name NOT LIKE '%.cdc_staging_table')\n", + " AND (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "b18ee129", + "displayName": "Selected Tables", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`,\n", + " table_name as `Target Table`\n", + "FROM monitored_tables LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (table_name is not NULL)\n", + " AND (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "19d796af", + "displayName": "Pipelines Runs Status", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`,\n", + " pipeline_name as `Pipeline Name`,\n", + " pipeline_run_link as `Pipeline run`,\n", + " create_time as `Create time`,\n", + " end_time as `End time`,\n", + " latest_state as `Status String`,\n", + " latest_state_with_color as `Status`,\n", + " ifnull(latest_error_log_message, '') as `Latest error log message`,\n", + " ifnull(latest_error_message,'') as `Latest error message`,\n", + " ifnull(latest_error_code,'') as `Latest error code`,\n", + " latest_error_full as `Latest full error`,\n", + " timestampdiff(second, create_time, ifnull(end_time, current_timestamp())) AS `Total seconds`,\n", + " timestampdiff(second, create_time, initialization_start_time) AS `Starting seconds`,\n", + " timestampdiff(second, initialization_start_time, running_start_time) AS `Initialization seconds`,\n", + " timestampdiff(second, running_start_time, ifnull(end_time, current_timestamp())) AS `Running seconds`\n", + "FROM pipeline_runs_status LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((create_time >= :event_range.min AND create_time <= :event_range.max )\n", + " OR (end_time >= :event_range.min AND end_time <= :event_range.max))\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY create_time DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "3b8b7d86", + "displayName": "Table Errors", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " create_time as `Pipeline Start Time`,\n", + " event_timestamp as `Error Time`,\n", + " error_log_message as `Error Log Message`,\n", + " error_message as `Error Message`,\n", + " ifnull(error_code, '') as `Error Code`,\n", + " error_full as `Error Details`\n", + "FROM events_errors AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND table_name NOT LIKE '%.cdc_staging_table'\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "e816f2b3", + "displayName": "Table Latencies", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " event_timestamp as `Event Time`,\n", + " max_event_time as `Newest Event Time`,\n", + " timestampdiff(SECOND, max_event_time, event_timestamp) AS `Newest Event Latency`\n", + " \n", + "FROM events_table_metrics AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND (flow_type = 'cdc')\n", + " AND (max_event_time IS NOT NULL)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + }, + { + "name": "62d03d54", + "displayName": "Pipelines Status", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`,\n", + " pipeline_name as `Pipeline Name`,\n", + " latest_pipeline_run_link as `Latest Pipeline Run`,\n", + " latest_pipeline_run_create_time as `Latest Create time`,\n", + " latest_pipeline_run_end_time as `Latest End time`,\n", + " latest_pipeline_run_state as `Latest Status String`,\n", + " latest_pipeline_run_state_with_color as `Latest Status`,\n", + " ifnull(latest_error_log_message, '') as `Latest error log message`,\n", + " ifnull(latest_error_message,'') as `Latest error message`,\n", + " ifnull(latest_error_code,'') as `Latest error code`,\n", + " ifnull(latest_pipeline_run_num_errors, 0) as `Latest run errors`,\n", + " ifnull(latest_pipeline_run_num_warnings, 0) as `Latest run warnings`,\n", + " timestampdiff(second, latest_pipeline_run_create_time, ifnull(latest_pipeline_run_end_time, current_timestamp())) AS `Latest Total Seconds`,\n", + " updated_at as `Last Updated`\n", + "FROM pipelines_status LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((latest_pipeline_run_create_time >= :event_range.min AND latest_pipeline_run_create_time <= :event_range.max )\n", + " OR (latest_pipeline_run_end_time >= :event_range.min AND latest_pipeline_run_end_time <= :event_range.max))\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY latest_pipeline_run_state_level DESC , \n", + " latest_pipeline_run_create_time DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_cdc_connector_monitoring" + } + ], + "pages": [ + { + "name": "cd63ec52", + "displayName": "Filtered", + "layout": [ + { + "widget": { + "name": "df5934b3", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "0ac0bb93", + "fields": [ + { + "name": "Pipeline Id", + "expression": "`Pipeline Id`" + }, + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "Pipeline Type", + "expression": "`Pipeline Type`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Id", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100000, + "title": "Pipeline Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "displayName": "Pipeline Id" + }, + { + "fieldName": "Pipeline Name", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100001, + "title": "Pipeline Name", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 400, + "displayName": "Pipeline Name" + }, + { + "fieldName": "Pipeline Type", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100002, + "title": "Pipeline Type", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "displayName": "Pipeline Type" + } + ] + }, + "invisibleColumns": [], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Selected Pipelines", + "description": "Pipelines selected using the global filters" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 6, + "height": 5 + } + }, + { + "widget": { + "name": "665329d1", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "b18ee129", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100000, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 400, + "displayName": "Pipeline" + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100001, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "displayName": "Target Table" + } + ] + }, + "invisibleColumns": [], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Selected Tables", + "description": "Tables selected using the global filters" + } + } + }, + "position": { + "x": 0, + "y": 7, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "1fcb6a9c", + "multilineTextboxSpec": { + "lines": [ + "***Pipelines, pipeline runs and tables selected using the Global filters***" + ] + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "5693a89a", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Pipeline run", + "expression": "`Pipeline run`" + }, + { + "name": "Status", + "expression": "`Status`" + }, + { + "name": "Create time", + "expression": "`Create time`" + }, + { + "name": "End time", + "expression": "`End time`" + }, + { + "name": "Latest error log message", + "expression": "`Latest error log message`" + }, + { + "name": "Latest error message", + "expression": "`Latest error message`" + }, + { + "name": "Total seconds", + "expression": "`Total seconds`" + }, + { + "name": "Starting seconds", + "expression": "`Starting seconds`" + }, + { + "name": "Initialization seconds", + "expression": "`Initialization seconds`" + }, + { + "name": "Running seconds", + "expression": "`Running seconds`" + }, + { + "name": "Latest full error", + "expression": "`Latest full error`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 0, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Status", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Status", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 150 + }, + { + "fieldName": "Create time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 3, + "title": "Create time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "End time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 4, + "title": "End time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest error log message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Latest error log message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest error message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 6, + "title": "Latest error message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Total seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 7, + "title": "Total seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Starting seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 8, + "title": "Starting seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Initialization seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 10, + "title": "Initialization seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Running seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 11, + "title": "Running seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest full error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 12, + "title": "Latest full error", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 100001, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Status String", + "type": "string", + "displayAs": "string", + "order": 100005, + "title": "Status String", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Latest error code", + "type": "string", + "displayAs": "string", + "order": 100009, + "title": "Latest error code", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Selected pipeline runs", + "description": "Pipeline runs selected using the global filters" + } + } + }, + "position": { + "x": 0, + "y": 13, + "width": 6, + "height": 9 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "a44ce042", + "displayName": "Pipelines Health", + "layout": [ + { + "widget": { + "name": "dd476651", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "c91619f8", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Observability data freshness", + "expression": "`Observability data freshness`" + }, + { + "name": "Check time", + "expression": "`Check time`" + }, + { + "name": "Latency at check time", + "expression": "`Latency at check time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100000, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350, + "displayName": "Pipeline" + }, + { + "fieldName": "Observability data freshness", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 100001, + "title": "Observability data freshness", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "displayName": "Observability data freshness" + }, + { + "fieldName": "Check time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 100002, + "title": "Check time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "displayName": "Check time" + }, + { + "fieldName": "Latency at check time", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100003, + "title": "Latency at check time", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "displayName": "Latency at check time" + } + ] + }, + "invisibleColumns": [], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Observability Data Freshness", + "description": "Time and latency of the latest processed event by pipeline (oldest to newest)" + } + } + }, + "position": { + "x": 0, + "y": 17, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "6820f9d5", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "82c4e3b5", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Error Time", + "expression": "`Error Time`" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + }, + { + "name": "Error Message in Log", + "expression": "`Error Message in Log`" + }, + { + "name": "Error Message", + "expression": "`Error Message`" + }, + { + "name": "Error Details", + "expression": "`Error Details`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 6, + "title": "Error Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 7, + "title": "Error Code", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message in Log", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 8, + "title": "Error Message in Log", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 9, + "title": "Error Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Details", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 10, + "title": "Error Details", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Id", + "type": "string", + "displayAs": "string", + "order": 2, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 5, + "title": "Pipeline Run Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Pipelines Errors", + "description": "Detected errors in the pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 6, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "9ac0f134", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "c0688ffc", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run Id", + "expression": "`Pipeline Run Id`" + }, + { + "name": "Warning Time", + "expression": "`Warning Time`" + }, + { + "name": "Warning Message", + "expression": "`Warning Message`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100001, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run Id", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100002, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Warning Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 100005, + "title": "Warning Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Warning Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100006, + "title": "Warning Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 100000, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run", + "type": "string", + "displayAs": "string", + "order": 100003, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 100004, + "title": "Pipeline Run Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Pipelines Warnings", + "description": "Detected warnings in the pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 12, + "width": 6, + "height": 5 + } + }, + { + "widget": { + "name": "5527b0d7", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "62d03d54", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Latest Pipeline Run", + "expression": "`Latest Pipeline Run`" + }, + { + "name": "Latest Status", + "expression": "`Latest Status`" + }, + { + "name": "Latest error code", + "expression": "`Latest error code`" + }, + { + "name": "Latest Create time", + "expression": "`Latest Create time`" + }, + { + "name": "Latest End time", + "expression": "`Latest End time`" + }, + { + "name": "Latest run errors", + "expression": "`Latest run errors`" + }, + { + "name": "Latest run warnings", + "expression": "`Latest run warnings`" + }, + { + "name": "Latest error message", + "expression": "`Latest error message`" + }, + { + "name": "Latest Total Seconds", + "expression": "`Latest Total Seconds`" + }, + { + "name": "Last Updated", + "expression": "`Last Updated`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 0, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Latest Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Latest Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Latest Status", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Latest Status", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 150 + }, + { + "fieldName": "Latest error code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Latest error code", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Create time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 5, + "title": "Latest Create time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest End time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 6, + "title": "Latest End time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest run errors", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 9, + "title": "Latest run errors", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest run warnings", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 10, + "title": "Latest run warnings", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest error message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 11, + "title": "Latest error message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Total Seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 12, + "title": "Latest Total Seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Last Updated", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 13, + "title": "Last Updated", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 1, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Latest Status String", + "type": "string", + "displayAs": "string", + "order": 7, + "title": "Latest Status String", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Latest error log message", + "type": "string", + "displayAs": "string", + "order": 8, + "title": "Latest error log message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Latest Runs by Pipeline ", + "description": "The status of the latest run for each pipeline" + } + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 6 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "0f6e96ec", + "displayName": "Pipelines Metrics", + "layout": [ + { + "widget": { + "name": "6e698544", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2faba057", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "hourly(Hour)", + "expression": "DATE_TRUNC(\"HOUR\", `Hour`)" + }, + { + "name": "sum(Number of Errors)", + "expression": "SUM(`Number of Errors`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "line", + "encodings": { + "x": { + "fieldName": "hourly(Hour)", + "scale": { + "type": "temporal" + }, + "displayName": "Hour" + }, + "y": { + "fieldName": "sum(Number of Errors)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + }, + "displayName": "Pipeline Name" + }, + "label": { + "show": false + } + }, + "annotations": [], + "frame": { + "showDescription": true, + "showTitle": true, + "title": "Hourly error rate", + "description": "The number of errors per hour across selected pipelines" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "c972afff", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "82c4e3b5", + "fields": [ + { + "name": "count(*)", + "expression": "COUNT(`*`)" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "pie", + "encodings": { + "angle": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Error Code", + "scale": { + "type": "categorical" + } + }, + "label": { + "show": true + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Distribution of errors codes", + "description": "Relative share of different types of errors across pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 8, + "width": 6, + "height": 7 + } + }, + { + "widget": { + "name": "dd54ef0b", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Status String", + "expression": "`Status String`" + }, + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Total seconds)", + "expression": "SUM(`Total seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Total seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Status String", + "scale": { + "type": "categorical", + "mappings": [ + { + "value": "COMPLETED", + "color": { + "themeColorType": "visualizationColors", + "position": 3 + } + }, + { + "value": "FAILED", + "color": { + "themeColorType": "visualizationColors", + "position": 4 + } + }, + { + "value": "WAITING_FOR_RESOURCES", + "color": { + "themeColorType": "visualizationColors", + "position": 5 + } + } + ] + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Total time of pipeline runs", + "description": "Total execution time and final status for recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 25, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "c1f6e5a8", + "multilineTextboxSpec": { + "lines": [ + "# Errors" + ] + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "f7a042ed", + "multilineTextboxSpec": { + "lines": [ + "# Pipeline Durations" + ] + } + }, + "position": { + "x": 0, + "y": 23, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "6b06b493", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Starting seconds)", + "expression": "SUM(`Starting seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Starting seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Starting time of pipeline runs", + "description": "Captures the time waiting for resources to start the pipeline" + } + } + }, + "position": { + "x": 0, + "y": 31, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "77547bb3", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Initialization seconds)", + "expression": "SUM(`Initialization seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Initialization seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Initialization time of pipeline runs", + "description": "Captures the time intializing the pipeline" + } + } + }, + "position": { + "x": 0, + "y": 37, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "530d0177", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Running seconds)", + "expression": "SUM(`Running seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Running seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Execution time of pipeline runs", + "description": "Captures the time evaluating the data flow graph of the pipeline" + } + } + }, + "position": { + "x": 0, + "y": 43, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "945b778a", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "95ce6f6c", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Loaded Change Events)", + "expression": "SUM(`Loaded Change Events`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "line", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Loaded Change Events)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly CDC Traffic", + "description": "Number of CDC events per hour across all tables" + } + } + }, + "position": { + "x": 0, + "y": 17, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "bc0b65a1", + "multilineTextboxSpec": { + "lines": [ + "# Throughput and Rates" + ] + } + }, + "position": { + "x": 0, + "y": 15, + "width": 6, + "height": 2 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "6701599f", + "displayName": "Tables Health", + "layout": [ + { + "widget": { + "name": "057ead9b", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "248b74c6", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Latest Run", + "expression": "`Latest Run`" + }, + { + "name": "Latest State", + "expression": "`Latest State`" + }, + { + "name": "Latest Schema", + "expression": "`Latest Schema`" + }, + { + "name": "Latest CDC Changes Time", + "expression": "`Latest CDC Changes Time`" + }, + { + "name": "Seconds Since Latest CDC Change", + "expression": "`Seconds Since Latest CDC Change`" + }, + { + "name": "Time of Latest Error", + "expression": "`Time of Latest Error`" + }, + { + "name": "Code of Latest Error", + "expression": "`Code of Latest Error`" + }, + { + "name": "Flow Type of Latest Error", + "expression": "`Flow Type of Latest Error`" + }, + { + "name": "Message of Latest Error", + "expression": "`Message of Latest Error`" + }, + { + "name": "Log Message of Latest Error", + "expression": "`Log Message of Latest Error`" + }, + { + "name": "Latest Pipeline Run with Error", + "expression": "`Latest Pipeline Run with Error`" + }, + { + "name": "Last Updated", + "expression": "`Last Updated`" + }, + { + "name": "Seconds Since Latest Status Update", + "expression": "`Seconds Since Latest Status Update`" + }, + { + "name": "Last Data Refresh Time", + "expression": "`Last Data Refresh Time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Latest Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Latest State", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Latest State", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 125 + }, + { + "fieldName": "Latest Schema", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 5, + "title": "Latest Schema", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest CDC Changes Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 6, + "title": "Latest CDC Changes Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Seconds Since Latest CDC Change", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 7, + "title": "Seconds Since Latest CDC Change", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Time of Latest Error", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 9, + "title": "Time of Latest Error", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Code of Latest Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 10, + "title": "Code of Latest Error", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Flow Type of Latest Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 11, + "title": "Flow Type of Latest Error", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Message of Latest Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 12, + "title": "Message of Latest Error", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Log Message of Latest Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 13, + "title": "Log Message of Latest Error", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Pipeline Run with Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 14, + "title": "Latest Pipeline Run with Error", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 325 + }, + { + "fieldName": "Last Updated", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 15, + "title": "Last Updated", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Seconds Since Latest Status Update", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 16, + "title": "Seconds Since Latest Status Update", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Last Data Refresh Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 17, + "title": "Last Data Refresh Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Latest Snapshot Changes Time", + "type": "datetime", + "displayAs": "datetime", + "order": 8, + "title": "Latest Snapshot Changes Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Status", + "description": "Latest Status of Monitored Tables" + } + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 10 + } + }, + { + "widget": { + "name": "cc2c8376", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "7103e200", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Table State", + "expression": "`Table State`" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + }, + { + "name": "Error Message", + "expression": "`Error Message`" + }, + { + "name": "Error Message in Log", + "expression": "`Error Message in Log`" + }, + { + "name": "Error Time", + "expression": "`Error Time`" + }, + { + "name": "Latest Schema", + "expression": "`Latest Schema`" + }, + { + "name": "Pipeline Run Create Time", + "expression": "`Pipeline Run Create Time`" + }, + { + "name": "Pipeline Run End Time", + "expression": "`Pipeline Run End Time`" + }, + { + "name": "Pipeline Run State", + "expression": "`Pipeline Run State`" + }, + { + "name": "Last Updated", + "expression": "`Last Updated`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "description": "", + "defaultColumnWidth": 325 + }, + { + "fieldName": "Table State", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Table State", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 125 + }, + { + "fieldName": "Error Code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 6, + "title": "Error Code", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 7, + "title": "Error Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message in Log", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 8, + "title": "Error Message in Log", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 9, + "title": "Error Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Schema", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 10, + "title": "Latest Schema", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run Create Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 11, + "title": "Pipeline Run Create Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run End Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 12, + "title": "Pipeline Run End Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run State", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 13, + "title": "Pipeline Run State", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 150 + }, + { + "fieldName": "Last Updated", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 14, + "title": "Last Updated", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Table State String", + "type": "string", + "displayAs": "string", + "order": 4, + "title": "Table State String", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Tables Status per Pipeline Run", + "description": "Table statuses for recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 20, + "width": 6, + "height": 9 + } + }, + { + "widget": { + "name": "0ec2e19e", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "3b8b7d86", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + }, + { + "name": "Error Message", + "expression": "`Error Message`" + }, + { + "name": "Error Log Message", + "expression": "`Error Log Message`" + }, + { + "name": "Error Details", + "expression": "`Error Details`" + }, + { + "name": "Error Time", + "expression": "`Error Time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 325 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Error Code", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 6, + "title": "Error Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Log Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 7, + "title": "Error Log Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Details", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 8, + "title": "Error Details", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 9, + "title": "Error Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Id", + "type": "string", + "displayAs": "string", + "order": 2, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 10, + "title": "Pipeline Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Errors", + "description": "Table error details in recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 10, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "be0d5fae", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "8897fc81", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Warning Log Message", + "expression": "`Warning Log Message`" + }, + { + "name": "Warning Time", + "expression": "`Warning Time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 325 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Warning Log Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Warning Log Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Warning Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 6, + "title": "Warning Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Id", + "type": "string", + "displayAs": "string", + "order": 2, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 7, + "title": "Pipeline Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Warnings", + "description": "Table warnings details in recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 16, + "width": 6, + "height": 4 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "f0c5d746", + "displayName": "Global Filters", + "layout": [ + { + "widget": { + "name": "5315d004", + "queries": [ + { + "name": "dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601178b602e1d108727138_pipeline_id", + "query": { + "datasetName": "6740610b", + "fields": [ + { + "name": "pipeline_id", + "expression": "`pipeline_id`" + }, + { + "name": "pipeline_id_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601159af484fd01d16be3e_pipeline_id", + "query": { + "datasetName": "0ac0bb93", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601180bf22ba62026daa2f_pipeline_id", + "query": { + "datasetName": "8db826d7", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f099799860118983e81bc3b6a962b2_pipeline_id", + "query": { + "datasetName": "b18ee129", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09979986011938281b2e91deeb22e_pipeline_id", + "query": { + "datasetName": "5e3eb18f", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601166bd423ceb08ab0480_pipeline_id", + "query": { + "datasetName": "19d796af", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09979986010f6ab5f17b26821c8ed_pipeline_id", + "query": { + "datasetName": "c91619f8", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b17b212157aaf06cacc9e20a714_pipeline_id", + "query": { + "datasetName": "82c4e3b5", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b1d1f751ec0a5513c9b7bbc36ba_pipeline_id", + "query": { + "datasetName": "c0688ffc", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b357c42177b884faa4ece765b86_pipeline_id", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b368fb01653b59b6beab34c1a0e_pipeline_id", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b3922e015488ee3e51f8258cb39_pipeline_id", + "query": { + "datasetName": "2faba057", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a07596bd183e8afc05567f618049_pipeline_id", + "query": { + "datasetName": "62d03d54", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b4e13f111593341bd3e635c51e_pipeline_id", + "query": { + "datasetName": "95ce6f6c", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b72e4f14a8a47d55a3ada231a4_pipeline_id", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3a926d2197f977cc64094d50850_pipeline_id", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3b4ca4f13e4b04ebcec0af2eebc_pipeline_id", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3be2f241744ab763e896db478c0_pipeline_id", + "query": { + "datasetName": "e816f2b3", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3bf4f0515ad858b1dc97af26143_pipeline_id", + "query": { + "datasetName": "b18cca98", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-single-select", + "encodings": { + "fields": [ + { + "fieldName": "pipeline_id", + "queryName": "dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601178b602e1d108727138_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601159af484fd01d16be3e_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601180bf22ba62026daa2f_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f099799860118983e81bc3b6a962b2_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09979986011938281b2e91deeb22e_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601166bd423ceb08ab0480_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09979986010f6ab5f17b26821c8ed_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b17b212157aaf06cacc9e20a714_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b1d1f751ec0a5513c9b7bbc36ba_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b357c42177b884faa4ece765b86_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b368fb01653b59b6beab34c1a0e_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b3922e015488ee3e51f8258cb39_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a07596bd183e8afc05567f618049_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b4e13f111593341bd3e635c51e_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b72e4f14a8a47d55a3ada231a4_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3a926d2197f977cc64094d50850_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3b4ca4f13e4b04ebcec0af2eebc_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3be2f241744ab763e896db478c0_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3bf4f0515ad858b1dc97af26143_pipeline_id" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Pipelines", + "description": "Select a pipeline to observe" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 1, + "height": 2 + } + }, + { + "widget": { + "name": "38094f88", + "queries": [ + { + "name": "dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601180bf22ba62026daa2f_table_name", + "query": { + "datasetName": "8db826d7", + "fields": [ + { + "name": "table_name", + "expression": "`table_name`" + }, + { + "name": "table_name_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f099799860118983e81bc3b6a962b2_table_name", + "query": { + "datasetName": "b18ee129", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b357c42177b884faa4ece765b86_table_name", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b368fb01653b59b6beab34c1a0e_table_name", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b72e4f14a8a47d55a3ada231a4_table_name", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3a926d2197f977cc64094d50850_table_name", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3b4ca4f13e4b04ebcec0af2eebc_table_name", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3be2f241744ab763e896db478c0_table_name", + "query": { + "datasetName": "e816f2b3", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3bf4f0515ad858b1dc97af26143_table_name", + "query": { + "datasetName": "b18cca98", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-single-select", + "encodings": { + "fields": [ + { + "fieldName": "table_name", + "queryName": "dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601180bf22ba62026daa2f_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f099799860118983e81bc3b6a962b2_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b357c42177b884faa4ece765b86_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b368fb01653b59b6beab34c1a0e_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b72e4f14a8a47d55a3ada231a4_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3a926d2197f977cc64094d50850_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3b4ca4f13e4b04ebcec0af2eebc_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3be2f241744ab763e896db478c0_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3bf4f0515ad858b1dc97af26143_table_name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Tables", + "description": "Select a table to observe" + } + } + }, + "position": { + "x": 0, + "y": 4, + "width": 1, + "height": 2 + } + }, + { + "widget": { + "name": "cc47542d", + "queries": [ + { + "name": "dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09979986011938281b2e91deeb22e_event_timestamp", + "query": { + "datasetName": "5e3eb18f", + "fields": [ + { + "name": "event_timestamp", + "expression": "`event_timestamp`" + }, + { + "name": "event_timestamp_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601166bd423ceb08ab0480_event_range", + "query": { + "datasetName": "19d796af", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b17b212157aaf06cacc9e20a714_event_range", + "query": { + "datasetName": "82c4e3b5", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b1d1f751ec0a5513c9b7bbc36ba_event_range", + "query": { + "datasetName": "c0688ffc", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b357c42177b884faa4ece765b86_event_range", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b368fb01653b59b6beab34c1a0e_event_range", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b3922e015488ee3e51f8258cb39_event_range", + "query": { + "datasetName": "2faba057", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a07596bd183e8afc05567f618049_event_range", + "query": { + "datasetName": "62d03d54", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b4e13f111593341bd3e635c51e_event_range", + "query": { + "datasetName": "95ce6f6c", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b72e4f14a8a47d55a3ada231a4_event_range", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3a926d2197f977cc64094d50850_event_range", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3b4ca4f13e4b04ebcec0af2eebc_event_range", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3be2f241744ab763e896db478c0_event_range", + "query": { + "datasetName": "e816f2b3", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3bf4f0515ad858b1dc97af26143_event_range", + "query": { + "datasetName": "b18cca98", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-date-range-picker", + "encodings": { + "fields": [ + { + "fieldName": "event_timestamp", + "queryName": "dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09979986011938281b2e91deeb22e_event_timestamp" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0997998601166bd423ceb08ab0480_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b17b212157aaf06cacc9e20a714_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b1d1f751ec0a5513c9b7bbc36ba_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b357c42177b884faa4ece765b86_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b368fb01653b59b6beab34c1a0e_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f09b3922e015488ee3e51f8258cb39_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a07596bd183e8afc05567f618049_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b4e13f111593341bd3e635c51e_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a0b72e4f14a8a47d55a3ada231a4_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3a926d2197f977cc64094d50850_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3b4ca4f13e4b04ebcec0af2eebc_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3be2f241744ab763e896db478c0_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f097e16d3b145a851abc79c8c30922/datasets/01f0a3bf4f0515ad858b1dc97af26143_event_range" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Date/Time Range", + "description": "Select a event log data/time range to observe" + }, + "selection": { + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-24h/h" + }, + "max": { + "value": "now/h" + } + } + } + } + } + }, + "position": { + "x": 0, + "y": 6, + "width": 1, + "height": 2 + } + }, + { + "widget": { + "name": "pipeline-tags", + "queries": [ + { + "name": "dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d497c91592afe34a81c0611ed0_tag_value", + "query": { + "datasetName": "02dde459", + "fields": [ + { + "name": "tag_value", + "expression": "`tag_value`" + }, + { + "name": "tag_value_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19bcbaef4ff2bdbbb4ce_tag_value", + "query": { + "datasetName": "6740610b", + "parameters": [ + { + "name": "tag_value", + "keyword": "tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a18f2ab82a07a40d1f93e_pipeline_tag_value", + "query": { + "datasetName": "82c4e3b5", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a198789fe18d94c25750e_pipeline_tag_value", + "query": { + "datasetName": "c91619f8", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a199d8500d29a4648a1b3_pipeline_tag_value", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19c794cc4a9be6c3c03f_pipeline_tag_value", + "query": { + "datasetName": "95ce6f6c", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19d4bb0d55cb33323e6a_pipeline_tag_value", + "query": { + "datasetName": "b18cca98", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19e2b06cdb5127dd4251_pipeline_tag_value", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19ef8e29f8cc7d12885f_pipeline_tag_value", + "query": { + "datasetName": "2faba057", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a128abc61f1b099ecc7_pipeline_tag_value", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a2b97c852b27ed58850_pipeline_tag_value", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a3bb934c529089a2a16_pipeline_tag_value", + "query": { + "datasetName": "c0688ffc", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a489475dc731fcbb1cb_pipeline_tag_value", + "query": { + "datasetName": "8db826d7", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a5886ca43f11580bd33_pipeline_tag_value", + "query": { + "datasetName": "b18ee129", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a20bdb20f7f88ae9fee_pipeline_tag_value", + "query": { + "datasetName": "0ac0bb93", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a65a90b078c6abf6f1d_pipeline_tag_value", + "query": { + "datasetName": "19d796af", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a74b47b197d404bb4c6_pipeline_tag_value", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a869a2ffaa2feafc39e_pipeline_tag_value", + "query": { + "datasetName": "e816f2b3", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a97997504978f76acc0_pipeline_tag_value", + "query": { + "datasetName": "62d03d54", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-single-select", + "encodings": { + "fields": [ + { + "fieldName": "tag_value", + "queryName": "dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d497c91592afe34a81c0611ed0_tag_value" + }, + { + "parameterName": "tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19bcbaef4ff2bdbbb4ce_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a18f2ab82a07a40d1f93e_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a198789fe18d94c25750e_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a199d8500d29a4648a1b3_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19c794cc4a9be6c3c03f_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19d4bb0d55cb33323e6a_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19e2b06cdb5127dd4251_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a19ef8e29f8cc7d12885f_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a128abc61f1b099ecc7_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a2b97c852b27ed58850_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a3bb934c529089a2a16_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a489475dc731fcbb1cb_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a5886ca43f11580bd33_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a20bdb20f7f88ae9fee_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a65a90b078c6abf6f1d_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a74b47b197d404bb4c6_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a869a2ffaa2feafc39e_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0c0d48819180e85ebeb3770d6fd99/datasets/01f0c0d4881a1a97997504978f76acc0_pipeline_tag_value" + } + ] + }, + "frame": { + "showTitle": true, + "title": "Pipeline Tags", + "showDescription": true, + "description": "Filter pipelines by tag values" + }, + "disallowAll": true, + "selection": { + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + } + }, + "position": { + "x": 0, + "y": 0, + "width": 1, + "height": 2 + } + } + ], + "pageType": "PAGE_TYPE_GLOBAL_FILTERS" + }, + { + "name": "75df3911", + "displayName": "Tables Metrics", + "layout": [ + { + "widget": { + "name": "db12c0c1", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "3b8b7d86", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Error Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Error Time`)" + }, + { + "name": "count(*)", + "expression": "COUNT(`*`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Error Time)", + "scale": { + "type": "temporal" + }, + "displayName": "Error Time" + }, + "y": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + }, + "displayName": "Count of Records" + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + }, + "displayName": "Target Table" + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Errors", + "description": "Detected errors while processing tables" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "841d82eb", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "8897fc81", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Warning Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Warning Time`)" + }, + { + "name": "count(*)", + "expression": "COUNT(`*`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Warning Time)", + "scale": { + "type": "categorical" + }, + "displayName": "Warning Time" + }, + "y": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + }, + "displayName": "Count of Records" + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + }, + "displayName": "Target Table" + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Warnings", + "description": "Detected warnings while processing tables" + } + } + }, + "position": { + "x": 0, + "y": 16, + "width": 6, + "height": 5 + } + }, + { + "widget": { + "name": "6c9cec76", + "multilineTextboxSpec": { + "lines": [ + "# Errors and Warnings" + ] + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "3f69a51c", + "multilineTextboxSpec": { + "lines": [ + "# Latencies" + ] + } + }, + "position": { + "x": 0, + "y": 35, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "e968324c", + "multilineTextboxSpec": { + "lines": [ + "# Rates and Throughput" + ] + } + }, + "position": { + "x": 0, + "y": 21, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "edb32903", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2660d018", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Applied Changes)", + "expression": "SUM(`Applied Changes`)" + } + ], + "parameterValues": [ + { + "keyword": "flow_type", + "selection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "cdc" + } + ] + } + } + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Applied Changes)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly CDC Throughput", + "description": "Number of CDC Changes Applied to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 23, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "9ac662c8", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2660d018", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Applied Changes)", + "expression": "SUM(`Applied Changes`)" + } + ], + "parameterValues": [ + { + "keyword": "flow_type", + "selection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "snapshot" + } + ] + } + } + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Applied Changes)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Snapshot Throughput", + "description": "Number of Snapshot Changes Applied to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 29, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "c4b2aec5", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "3b8b7d86", + "fields": [ + { + "name": "count(*)", + "expression": "COUNT(`*`)" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "pie", + "encodings": { + "angle": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + }, + "displayName": "Number of occurrences" + }, + "color": { + "fieldName": "Error Code", + "scale": { + "type": "categorical" + } + }, + "label": { + "show": true + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Error Frequency", + "description": "Distribution of errors by error codes" + } + } + }, + "position": { + "x": 0, + "y": 8, + "width": 6, + "height": 8 + } + }, + { + "widget": { + "name": "3cea8af9", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "e816f2b3", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Event Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Event Time`)" + }, + { + "name": "min(Newest Event Latency)", + "expression": "MIN(`Newest Event Latency`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "min(Newest Event Latency)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Data Latency", + "description": "The difference between the time of ingestion and the newest event time" + } + } + }, + "position": { + "x": 0, + "y": 37, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "c753eda7", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "b18cca98", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "max(Newest Event Latency)", + "expression": "MAX(`Newest Event Latency`)" + }, + { + "name": "minutely(Event Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Event Time`)" + }, + { + "name": "max(Oldest Event Latency)", + "expression": "MAX(`Oldest Event Latency`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "max(Oldest Event Latency)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "max(Newest Event Latency)" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Data Latency Outliers", + "description": "The difference between the time of ingestion and the oldest event time" + } + } + }, + "position": { + "x": 0, + "y": 43, + "width": 6, + "height": 6 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + } + ], + "uiSettings": { + "theme": { + "widgetHeaderAlignment": "ALIGNMENT_UNSPECIFIED" + }, + "applyModeEnabled": false + } +} diff --git a/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/databricks.yml b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/databricks.yml new file mode 100644 index 0000000..7f8c060 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/databricks.yml @@ -0,0 +1,81 @@ +# This DAB contains sample jobs, pipelines, and a Dashboard to build an observability solution for +# Lakeflow CDC Connector. + +bundle: + name: cdc_connector_monitoring_dab + +sync: + paths: + - ../lib + - ../jobs + - ../vars + - ../resources + +include: + # Shared variables and resources + - ../vars/common.vars.yml + - ../vars/import_event_logs.vars.yml + - ../vars/pipeline_tags_index.vars.yml + - ../vars/post_deploy.vars.yml + - ../vars/third_party_sink.vars.yml + - ../resources/monitoring_schema.schema.yml + - ../resources/import_event_logs.job.yml + - ../resources/build_pipeline_tags_index.job.yml + - ../resources/post_deploy.job.yml + + # Resources specific to this DAB + - resources/*.yml + +variables: + # See also included shared *.vars.yml files above + + # Monitoring ETL pipeline configuration + directly_monitored_pipeline_ids: + description: > + A comma-separated list of CDC connector pipeline ids to monitor. The pipelines must have their event log configured for a direct write to a + Delta table (see https://docs.databricks.com/api/workspace/pipelines/create#event_log). If not, use the `imported_pipeline_ids` variable. + default: "" + directly_monitored_pipeline_tags: + description: > + A semicolon-separated list of comma-separated tag[:value] pairs to filter pipelines for direct monitoring. + Format: "tag1[:value1],tag2[:value2];tag3[:value3]" + - Semicolons (;) separate tag groups (OR logic between groups) + - Commas (,) separate tags within a group (ALL must match - AND logic) + - 'tag' is shorthand for 'tag:' (tag with empty value) + Example: "tier:T0;team:data,tier:T1" means (tier:T0) OR (team:data AND tier:T1) + This is an alternative to specifying pipeline IDs explicitly via `directly_monitored_pipeline_ids`. + If both are specified, pipelines matching either criteria will be included. + default: "" + imported_event_log_tables: + description: > + A comma-separated list of target tables for imported event logs. The format of those tables must be the same as the + event log format though each table may contain events from multiple event logs. Typically, these tables are generated using the `import_event_logs` + job(s). + default: "" + serverless_monitoring_pipeline_enabled: + description: Controls whether the monitoring ETL pipeline should be run on serverless compute. + default: true + monitoring_etl_schedule_state: + description: Enable (`UNPAUSED`) or disable (`PAUSED`) the periodic ETL of observability data + default: UNPAUSED + monitoring_etl_cron_schedule: + description: > + The cron schedule (see http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) to use for updating the observability + tables. Note that you also have to set `monitoring_etl_schedule_state` to `UNPAUSED` for this to take effect. The default is to run the + import hourly. + default: "0 30 0/1 * * ?" + +targets: +# dev: +# default: true +# mode: development +# variables: +# Configure the target monitoring catalog and schema. See variables in /vars/common.vars.yml +# dab_type: "CDC Connector" # recommended to distinguish from other monitoring DABs +# Configure imports of pipeline event logs not stored in a Delta table. See variables in /vars/import_event_logs.vars.yml +# Configure monitoring ETL. See variables above +# Dashboard configuration +# main_dashboard_template_path: ../cdc_connector_monitoring_dab/dashboards/CDC Connector Monitoring Dashboard Template.lvdash.json # Required +# main_dashboard_name: "CDC Connector Dashboard" # customize to any name +# Configure 3P observability integration (if desired). See variables in /vars/third_party_sink.vars.yml + diff --git a/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/monitoring_etl/cdc_monitoring_pipeline_main.py b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/monitoring_etl/cdc_monitoring_pipeline_main.py new file mode 100644 index 0000000..58823f3 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/monitoring_etl/cdc_monitoring_pipeline_main.py @@ -0,0 +1,240 @@ +import dlt +import sys +import logging + +sys.path.append("../../lib") + +from dbx_ingestion_monitoring.common_ldp import * + +# Configure logging +logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' +) +logger = logging.getLogger(__name__) +logger.info("Starting CDC Connector Monitoring ETL Pipeline") + +# Pipeline parameters + +conf = Configuration(spark.conf) + +class CdcConstants: + CDC_FLOW_TYPE = 'cdc' + SNAPSHOT_FLOW_TYPE = 'snapshot' + CDC_STAGING_TABLE_FLOW_TYPE = 'cdc_staging' + TABLE_STATUS_PER_PIPELINE_RUN = 'table_status_per_pipeline_run' + CDC_STAGING_TABLE = 'cdc_staging_table' + + +class CdcConnectorMonitoringEtlPipeline(MonitoringEtlPipeline): + def __init__(self, conf: Configuration, spark: SparkSession): + super().__init__(conf, spark) + + def _get_event_logs_bronze_sql(self, event_log_source: str): + """ + Override base definition for append flows from the event log sources into `event_logs_bronze` table. It adds + CDC Connector-specific fields + """ + sql = super()._get_event_logs_bronze_sql(event_log_source) + sql = sql.replace(Constants.sql_fields_def_extension_point, + f""", (CASE WHEN endswith(flow_name, "_snapshot_flow") THEN 'snapshot' + WHEN details:operation_progress.cdc_snapshot.table_name::string is not null THEN '{CdcConstants.SNAPSHOT_FLOW_TYPE}' + WHEN endswith(flow_name, "_cdc_flow") THEN '{CdcConstants.CDC_FLOW_TYPE}' + WHEN endswith(flow_name, ".{CdcConstants.CDC_STAGING_TABLE}") THEN '{CdcConstants.CDC_STAGING_TABLE_FLOW_TYPE}' + END) flow_type{Constants.sql_fields_def_extension_point} + """) + return sql + + + def _get_events_errors_sql(self): + sql = super()._get_events_errors_sql() + sql = sql.replace(Constants.sql_fields_def_extension_point, + f", flow_type{Constants.sql_fields_def_extension_point}") + return sql + + + def _get_events_warnings_sql(self): + sql = super()._get_events_warnings_sql() + sql = sql.replace(Constants.sql_fields_def_extension_point, + f", flow_type{Constants.sql_fields_def_extension_point}") + return sql + + + def _get_events_table_metrics_sql(self): + sql = super()._get_events_table_metrics_sql() + return sql.replace(Constants.sql_fields_def_extension_point, f", flow_type{Constants.sql_fields_def_extension_point}") + + + def register_base_tables_and_views(self, spark: SparkSession): + super().register_base_tables_and_views(spark) + + def _get_table_run_processing_state_sql(self): + sql = super()._get_table_run_processing_state_sql() + sql = sql.replace(Constants.where_clause_extension_point, f"AND (table_name not LIKE '%.{CdcConstants.CDC_STAGING_TABLE}') {Constants.where_clause_extension_point}") + sql = sql.replace(Constants.sql_fields_def_extension_point, f", flow_type{Constants.sql_fields_def_extension_point}") + return sql + + + def register_table_status(self, spark: SparkSession): + table_status_per_pipeline_run_cdf = f"{TABLE_STATUS_PER_PIPELINE_RUN.name}_cdf" + @dlt.view(name=table_status_per_pipeline_run_cdf) + def table_run_processing_state_cdf(): + return ( + spark.readStream + .option("readChangeFeed", "true") + .table(TABLE_STATUS_PER_PIPELINE_RUN.name) + .filter("_change_type IN ('insert', 'update_postimage')") + ) + + silver_table_name = f"{TABLE_STATUS.name}_silver" + dlt.create_streaming_table(name=silver_table_name, + comment="Capture information about the latest state, ingested data and errors for target tables", + cluster_by=['pipeline_id', 'table_name'], + table_properties={ + "delta.enableRowTracking": "true" + }) + + silver_latest_source_view_name = f"{silver_table_name}_latest_source" + @dlt.view(name=silver_latest_source_view_name) + def table_latest_run_processing_state_source(): + return spark.sql(f""" + SELECT pipeline_id, + table_name, + pipeline_run_id AS latest_pipeline_run_id, + pipeline_run_link AS latest_pipeline_run_link, + latest_state, + latest_state_level, + latest_state_color, + latest_state_with_color, + table_schema_json AS latest_table_schema_json, + table_schema AS latest_table_schema, + null AS latest_cdc_changes_time, + null AS latest_snapshot_changes_time, + (CASE WHEN latest_error_time IS NOT NULL THEN pipeline_run_id END) AS latest_error_pipeline_run_id, + (CASE WHEN latest_error_time IS NOT NULL THEN pipeline_run_link END) AS latest_error_pipeline_run_link, + latest_error_time, + latest_error_log_message, + latest_error_message, + latest_error_code, + latest_error_full, + (CASE WHEN latest_error_time IS NOT NULL THEN flow_type END) AS latest_error_flow_type, + updated_at + FROM STREAM(`{table_status_per_pipeline_run_cdf}`) + WHERE table_name NOT LIKE '%.{CdcConstants.CDC_STAGING_TABLE}' + """) + + dlt.create_auto_cdc_flow( + name=f"{silver_table_name}_apply_latest", + source=silver_latest_source_view_name, + target=silver_table_name, + keys=['pipeline_id', 'table_name'], + sequence_by='updated_at', + ignore_null_updates=True) + + silver_latest_cdc_changes_source_view_name = f"{silver_table_name}_latest_cdc_changes_source" + @dlt.view(name=silver_latest_cdc_changes_source_view_name) + def table_latest_run_processing_state_source(): + return spark.sql(f""" + SELECT pipeline_id, + table_name, + null AS latest_pipeline_run_id, + null AS latest_pipeline_run_link, + null AS latest_state, + null AS latest_state_level, + null AS latest_state_color, + null AS latest_state_with_color, + null AS latest_table_schema_json, + null AS latest_table_schema, + event_timestamp AS latest_cdc_changes_time, + null AS latest_snapshot_changes_time, + null AS latest_error_pipeline_run_id, + null AS latest_error_pipeline_run_link, + null AS latest_error_time, + null AS latest_error_log_message, + null AS latest_error_message, + null AS latest_error_code, + null AS latest_error_full, + null AS latest_error_flow_type, + event_timestamp AS updated_at + FROM STREAM(`{EVENTS_TABLE_METRICS.name}`) + WHERE table_name IS NOT null + AND num_written_rows > 0 + AND flow_type='cdc' + """) + + dlt.create_auto_cdc_flow( + name=f"{silver_table_name}_apply_latest_cdc_changes", + source=silver_latest_cdc_changes_source_view_name, + target=silver_table_name, + keys=['pipeline_id', 'table_name'], + sequence_by='updated_at', + ignore_null_updates=True) + + silver_latest_snapshot_changes_source_view_name = f"{silver_table_name}_latest_snapshot_changes_source" + @dlt.view(name=silver_latest_snapshot_changes_source_view_name) + def table_latest_run_processing_state_source(): + return spark.sql(f""" + SELECT pipeline_id, + table_name, + null AS latest_pipeline_run_id, + null AS latest_pipeline_run_link, + null AS latest_state, + null AS latest_state_level, + null AS latest_state_color, + null AS latest_state_with_color, + null AS latest_table_schema_json, + null AS latest_table_schema, + null AS latest_cdc_changes_time, + event_timestamp AS latest_snapshot_changes_time, + null AS latest_error_pipeline_run_id, + null AS latest_error_pipeline_run_link, + null AS latest_error_time, + null AS latest_error_log_message, + null AS latest_error_message, + null AS latest_error_code, + null AS latest_error_full, + null AS latest_error_flow_type, + event_timestamp AS updated_at + FROM STREAM(`{EVENTS_TABLE_METRICS.name}`) + WHERE table_name IS NOT null + AND num_written_rows > 0 + AND flow_type='snapshot' + """) + + dlt.create_auto_cdc_flow( + name=f"{silver_table_name}_apply_latest_snapshot_changes", + source=silver_latest_snapshot_changes_source_view_name, + target=silver_table_name, + keys=['pipeline_id', 'table_name'], + sequence_by='updated_at', + ignore_null_updates=True) + + @dlt.table(name=TABLE_STATUS.name, + comment=TABLE_STATUS.table_comment, + cluster_by=['pipeline_id', 'table_name'], + table_properties={ + "delta.enableRowTracking": "true" + }) + def table_status(): + return spark.sql(f""" + SELECT s.*, + latest_pipeline_run_num_written_cdc_changes, + latest_pipeline_run_num_written_snapshot_changes + FROM {silver_table_name} s + LEFT JOIN ( + SELECT pipeline_id, + pipeline_run_id, + table_name, + sum(ifnull(num_written_rows, 0)) FILTER (WHERE flow_type='{CdcConstants.CDC_FLOW_TYPE}') AS latest_pipeline_run_num_written_cdc_changes, + sum(ifnull(num_written_rows, 0)) FILTER (WHERE flow_type='{CdcConstants.SNAPSHOT_FLOW_TYPE}') AS latest_pipeline_run_num_written_snapshot_changes + FROM {EVENTS_TABLE_METRICS.name} + GROUP BY 1, 2, 3 + ) AS etm + ON s.pipeline_id = etm.pipeline_id + AND s.latest_pipeline_run_id = etm.pipeline_run_id + AND s.table_name = etm.table_name + """) + + +pipeline = CdcConnectorMonitoringEtlPipeline(conf, spark) +pipeline.register_base_tables_and_views(spark) \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/resources/monitoring_etl.pipeline.yml b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/resources/monitoring_etl.pipeline.yml new file mode 100644 index 0000000..84b99b7 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/resources/monitoring_etl.pipeline.yml @@ -0,0 +1,59 @@ +# This pipeline is responsible for the ETL of the data from the monitored sources and +# generating the observability tables + +resources: + pipelines: + cdc_connector_monitoring_etl: + name: "Monitoring ETL for CDC Connector Pipelines" + libraries: + - glob: + include: ../monitoring_etl/** + - glob: + include: ../../third_party_sinks/** + serverless: ${var.serverless_monitoring_pipeline_enabled} + development: true + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + root_path: ${workspace.file_path}/cdc_connector_monitoring_dab/monitoring_etl + event_log: + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + name: cdc_connector_monitoring_etl_event_log + configuration: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + directly_monitored_pipeline_ids: ${var.directly_monitored_pipeline_ids} + directly_monitored_pipeline_tags: ${var.directly_monitored_pipeline_tags} + imported_event_log_tables: ${var.imported_event_log_tables} + pipeline_tags_index_table_name: ${var.pipeline_tags_index_table_name} + pipeline_tags_index_enabled: ${var.pipeline_tags_index_enabled} + pipeline_tags_index_max_age_hours: ${var.pipeline_tags_index_max_age_hours} + pipeline_tags_index_api_fallback_enabled: ${var.pipeline_tags_index_api_fallback_enabled} + + # Third-party monitoring configuration + destination: ${var.third_party_destination} + host_name: ${var.third_party_host_name} + secrets_scope: ${var.third_party_secrets_scope} + endpoints.metrics: ${var.third_party_endpoints_metrics} + endpoints.logs: ${var.third_party_endpoints_logs} + endpoints.events: ${var.third_party_endpoints_events} + num_rows_per_batch: ${var.third_party_batch_size} + max_retry_duration_sec: ${var.third_party_max_retry_duration_sec} + request_timeout_sec: ${var.third_party_request_timeout_sec} + + # Datadog/New Relic API key (stored in secrets) + api_key: ${var.third_party_api_key} + + # Azure Monitor specific configuration + azure_client_id: ${var.azure_client_id} + azure_client_secret: ${var.azure_client_secret} + azure_tenant_id: ${var.azure_tenant_id} + azure_dcr_immutable_id: ${var.azure_dcr_immutable_id} + azure_authorization_endpoint: ${var.azure_authorization_endpoint} + azure_max_access_token_staleness: ${var.azure_max_access_token_staleness} + + # Splunk Observability specific configuration + splunk_access_token: ${var.splunk_access_token} + + # New Relic specific configuration + account_id: ${var.third_party_account_id} \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/resources/monitoring_etl_scheduled_runner.job.yml b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/resources/monitoring_etl_scheduled_runner.job.yml new file mode 100644 index 0000000..9c9de9f --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/cdc_connector_monitoring_dab/resources/monitoring_etl_scheduled_runner.job.yml @@ -0,0 +1,16 @@ +# This job is responsible for running the monitoring ETL + +resources: + jobs: + monitoring_etl_scheduled_runner: + name: "Scheduled runner for the Monitoring ETL Pipeline for CDC Connector Pipelines" + schedule: + pause_status: ${var.monitoring_etl_schedule_state} + quartz_cron_expression: ${var.monitoring_etl_cron_schedule} + timezone_id: UTC + email_notifications: + on_failure: ${var.notification_emails} + tasks: + - task_key: etl_runner + pipeline_task: + pipeline_id: ${resources.pipelines.cdc_connector_monitoring_etl.id} diff --git a/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/README.md b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/README.md new file mode 100644 index 0000000..76bb202 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/README.md @@ -0,0 +1,49 @@ +# Getting started + +1. Configure the monitoring ETL + +Refer to [../COMMON_CONFIGURATION.md](COMMON_CONFIGURATION.md) for common configuration options. + +Additionally, for this DAB, you need to include the following as part of your deployment target configuration: + +``` +main_dashboard_template_path: ../generic_sdp_monitoring_dab/dashboards/SDP Monitoring Dashboard Template.lvdash.json +``` + +The above will include the standard AI/BI dashboard as part of the deployment. If you have created your custom dashboard, you can replace the path to it in the above option. + + +2. Deploy the DAB +3. Generate monitoring tables - recommended to do that manually the first time: + 1. Run the `Import CDC Connector event logs` job if you have configured `imported_pipeline_ids` variable + 2. Run the `Scheduled runner for Monitoring ETL ... ` job +4. On successful Step 3, run the `Post-deploy actions ...` job. This will create a sample monitoring Dashboard and also annotate the monitoring tables with column comments for easier use and exploration. + +# Architecture + +``` + |--------------| + | Import event | + |->| logs Job |--| + | |--------------| | +|-------------| | | +| Pipeline | | | ------------- +| without | | |-->( Imported )--| +| Delta event | | | ( Events ) | |-----------------| +| log |--| | ( Delta Tables) | |-->| 3P Observability| +|-------------| | | ------------- | | | Platforms sinks | + | |--------------| | | |-------------| | |-----------------| + | | Import Event |--| |-->| Monitoring |--| + |->| logs Job | | | ETL Pipeline| | |------------| |-----------| + |--------------| | |-------------| |-->| Monitoring |-->| Example | + | | tables | | AI/BI | +------------------- -------------- | |------------| | Dashboard | +| Pipelines with |-------------------> ( Event Log )---| |-----------| +| Delta event log | ( Delta Tables ) +------------------ -------------- +``` + +# Monitoring tables + +See the table and column comments in the target `monitoring_catalog`.`monitoring_schema`. Remember to run the `Post-deploy actions ...` job that populates those comments. + diff --git a/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/dashboards/SDP Monitoring Dashboard Template.lvdash.json b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/dashboards/SDP Monitoring Dashboard Template.lvdash.json new file mode 100644 index 0000000..7535fbf --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/dashboards/SDP Monitoring Dashboard Template.lvdash.json @@ -0,0 +1,8248 @@ +{ + "datasets": [ + { + "name": "82c4e3b5", + "displayName": "Pipelines Runs Errors", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " ifnull(table_name, '') as `Target Table`,\n", + " create_time as `Pipeline Run Start Time`,\n", + " event_timestamp as `Error Time`,\n", + " error_log_message as `Error Message in Log`,\n", + " error_message as `Error Message`,\n", + " (CASE WHEN error_code is NULL THEN '' WHEN error_code='' THEN '' ELSE error_code END) as `Error Code`,\n", + " error_full as `Error Details`\n", + "FROM events_errors AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC \n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "c91619f8", + "displayName": "Pipelines Metrics Freshness", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`, \n", + " latest_event_timestamp as `Observability data freshness`,\n", + " current_timestamp() as `Check time`,\n", + " (current_timestamp() - latest_event_timestamp) as `Latency at check time`\n", + "FROM (SELECT pipeline_id,\n", + " pipeline_link,\n", + " max(event_timestamp) AS latest_event_timestamp\n", + " FROM event_logs_bronze LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + " GROUP BY 1,2)\n", + "ORDER BY latest_event_timestamp" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "2660d018", + "displayName": "Table Changes", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " event_timestamp as `Event Time`,\n", + " ifnull(num_output_rows, 0) as `Appended Rows`,\n", + " ifnull(num_upserted_rows, 0) as `Upserted Rows`,\n", + " ifnull(num_deleted_rows, 0) as `Deleted Rows`,\n", + " ifnull(num_output_bytes, 0) as `Appended Bytes`,\n", + " ifnull(num_written_rows, 0) as `Total Changes`\n", + "FROM events_table_metrics AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND (num_written_rows IS NOT NULL)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "5ef35e45", + "displayName": "Pipeline Tags", + "queryLines": [ + "SELECT '' as tag_value\n", + "UNION ALL\n", + "SELECT DISTINCT tag_value\n", + "FROM (SELECT explode(tags_array) as tag_value\n", + " FROM monitored_pipelines)" + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "6740610b", + "displayName": "Pipeline Ids", + "queryLines": [ + "SELECT pipeline_id\n", + "FROM monitored_pipelines\n", + "WHERE (:tag_value = '')\n", + " OR (:tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :tag_value))" + ], + "parameters": [ + { + "displayName": "tag_value", + "keyword": "tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "8897fc81", + "displayName": "Table Warnings", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " create_time as `Pipeline Start Time`,\n", + " event_timestamp as `Warning Time`,\n", + " warning_log_message as `Warning Log Message`\n", + "FROM events_warnings AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "2faba057", + "displayName": "Pipelines Hourly Error Rate", + "queryLines": [ + "SELECT pipeline_id as `Pipeline Id`,\n", + " pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " hour as `Hour`,\n", + " num_errors as `Number of Errors`\n", + "FROM metric_pipeline_error_rate AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (hour >= :event_range.min AND hour <= :event_range.max)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "5e3eb18f", + "displayName": "Event Times", + "queryLines": [ + "SELECT event_timestamp\n", + "FROM event_logs_bronze\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "248b74c6", + "displayName": "Table Status", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " table_name as `Target Table`,\n", + " latest_pipeline_run_link as `Latest Run`,\n", + " latest_state_with_color as `Latest State`,\n", + " latest_table_schema as `Latest Schema`,\n", + " current_timestamp() as `Last Data Refresh Time`,\n", + " latest_changes_time as `Latest Changes Time`,\n", + " timestampdiff(SECOND, latest_changes_time, current_timestamp()) as `Seconds Since Latest Change`,\n", + " latest_error_time as `Time of Latest Error`,\n", + " latest_error_code as `Code of Latest Error`,\n", + " latest_error_log_message as `Log Message of Latest Error`,\n", + " latest_error_message as `Message of Latest Error`,\n", + " latest_error_pipeline_run_link as `Latest Pipeline Run with Error`,\n", + " updated_at as `Last Updated`,\n", + " timestampdiff(SECOND, updated_at, current_timestamp()) as `Seconds Since Latest Status Update`\n", + "FROM table_status AS ts\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (updated_at >= :event_range.min AND updated_at <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY latest_state_level DESC,\n", + " updated_at ASC\n" + ], + "parameters": [ + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "0ac0bb93", + "displayName": "Selected Pipelines", + "queryLines": [ + "SELECT pipeline_id as `Pipeline Id`,\n", + " pipeline_link as `Pipeline Name`,\n", + " pipeline_type as `Pipeline Type`,\n", + " tags_map as `Pipeline Tags`\n", + "FROM monitored_pipelines\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "7103e200", + "displayName": "Table Status Per Pipeline Run", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " table_name as `Target Table`,\n", + " ts.pipeline_run_link as `Pipeline Run`,\n", + " ts.latest_state_with_color as `Table State`,\n", + " ifnull(ts.latest_error_code, '') as `Error Code`,\n", + " ts.latest_error_time as `Error Time`,\n", + " ifnull(ts.latest_error_message, '') as `Error Message`,\n", + " ifnull(ts.latest_error_log_message, '') as `Error Message in Log`,\n", + " table_schema as `Latest Schema`,\n", + " create_time as `Pipeline Run Create Time`,\n", + " end_time as `Pipeline Run End Time`,\n", + " prs.latest_state_with_color as `Pipeline Run State`,\n", + " ts.updated_at as `Last Updated`\n", + "FROM table_status_per_pipeline_run AS ts\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status AS prs USING (pipeline_id, pipeline_run_id) \n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (ts.updated_at >= :event_range.min AND ts.updated_at <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY create_time DESC,\n", + " ts.updated_at DESC\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "c0688ffc", + "displayName": "Pipelines Runs Warnings", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ew.pipeline_run_link as `Pipeline Run`,\n", + " create_time as `Pipeline Run Start Time`,\n", + " event_timestamp as `Warning Time`,\n", + " warning_log_message as `Warning Message`\n", + "FROM events_warnings AS ew\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "8db826d7", + "displayName": "Table Names", + "queryLines": [ + "SELECT DISTINCT table_name\n", + "FROM monitored_tables JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (table_name is not NULL)\n", + " AND (table_name NOT LIKE '%.cdc_staging_table')\n", + " AND (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "b18ee129", + "displayName": "Selected Tables", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`,\n", + " table_name as `Target Table`\n", + "FROM monitored_tables LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (table_name is not NULL)\n", + " AND (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value))) " + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "19d796af", + "displayName": "Pipelines Runs Status", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`,\n", + " pipeline_name as `Pipeline Name`,\n", + " pipeline_run_link as `Pipeline run`,\n", + " create_time as `Create time`,\n", + " end_time as `End time`,\n", + " latest_state as `Status String`,\n", + " latest_state_with_color as `Status`,\n", + " ifnull(latest_error_log_message, '') as `Latest error log message`,\n", + " ifnull(latest_error_message,'') as `Latest error message`,\n", + " ifnull(latest_error_code,'') as `Latest error code`,\n", + " latest_error_full as `Latest full error`,\n", + " timestampdiff(second, create_time, ifnull(end_time, current_timestamp())) AS `Total seconds`,\n", + " timestampdiff(second, create_time, initialization_start_time) AS `Starting seconds`,\n", + " timestampdiff(second, initialization_start_time, running_start_time) AS `Initialization seconds`,\n", + " timestampdiff(second, running_start_time, ifnull(end_time, current_timestamp())) AS `Running seconds`\n", + "FROM pipeline_runs_status LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((create_time >= :event_range.min AND create_time <= :event_range.max )\n", + " OR (end_time >= :event_range.min AND end_time <= :event_range.max))\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY create_time DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "3b8b7d86", + "displayName": "Table Errors", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " create_time as `Pipeline Start Time`,\n", + " event_timestamp as `Error Time`,\n", + " error_log_message as `Error Log Message`,\n", + " error_message as `Error Message`,\n", + " ifnull(error_code, '') as `Error Code`,\n", + " error_full as `Error Details`\n", + "FROM events_errors AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "62d03d54", + "displayName": "Pipelines Status", + "queryLines": [ + "SELECT pipeline_link as `Pipeline`,\n", + " pipeline_name as `Pipeline Name`,\n", + " latest_pipeline_run_link as `Latest Pipeline Run`,\n", + " latest_pipeline_run_create_time as `Latest Create time`,\n", + " latest_pipeline_run_end_time as `Latest End time`,\n", + " latest_pipeline_run_state as `Latest Status String`,\n", + " latest_pipeline_run_state_with_color as `Latest Status`,\n", + " ifnull(latest_error_log_message, '') as `Latest error log message`,\n", + " ifnull(latest_error_message,'') as `Latest error message`,\n", + " ifnull(latest_error_code,'') as `Latest error code`,\n", + " ifnull(latest_pipeline_run_num_errors, 0) as `Latest run errors`,\n", + " ifnull(latest_pipeline_run_num_warnings, 0) as `Latest run warnings`,\n", + " timestampdiff(second, latest_pipeline_run_create_time, ifnull(latest_pipeline_run_end_time, current_timestamp())) AS `Latest Total Seconds`,\n", + " updated_at as `Last Updated`\n", + "FROM pipelines_status LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE (:pipeline_id = 'All' OR pipeline_id = :pipeline_id)\n", + " AND ((latest_pipeline_run_create_time >= :event_range.min AND latest_pipeline_run_create_time <= :event_range.max )\n", + " OR (latest_pipeline_run_end_time >= :event_range.min AND latest_pipeline_run_end_time <= :event_range.max))\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY latest_pipeline_run_state_level DESC , \n", + " latest_pipeline_run_create_time DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "b72dbc41", + "displayName": "Table Backlog", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " event_timestamp as `Event Time`,\n", + " ifnull(backlog_bytes, 0) as `Backlog Bytes`,\n", + " ifnull(backlog_records, 0) as `Backlog Rows`,\n", + " ifnull(backlog_files, 0) as `Backlog Files`,\n", + " ifnull(backlog_seconds, 0) as `Backlog Seconds`\n", + "FROM events_table_metrics AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "e9270b22", + "displayName": "Table Expectation Checks", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " teec.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " create_time as `Pipeline Start Time`,\n", + " expectation_name as `Expectation Name`,\n", + " num_passed as `Rows Passed`,\n", + " num_failed as `Rows Failed`,\n", + " failure_pct as `Failure Percentage`,\n", + " event_timestamp as `Check Time`\n", + "FROM table_events_expectation_checks AS teec\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "f786fd7e", + "displayName": "Table Expectation Failures", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " teec.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " create_time as `Pipeline Start Time`,\n", + " expectation_name as `Expectation Name`,\n", + " num_passed as `Rows Passed`,\n", + " num_failed as `Rows Failed`,\n", + " failure_pct as `Failure Percentage`,\n", + " event_timestamp as `Check Time`\n", + "FROM table_events_expectation_checks AS teec\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + " LEFT JOIN pipeline_runs_status USING (pipeline_id, pipeline_run_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND num_failed > 0\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + }, + { + "name": "bc1ad993", + "displayName": "Table Expectations Dropped Rows", + "queryLines": [ + "SELECT pipeline_name as `Pipeline Name`,\n", + " pipeline_link as `Pipeline Link`,\n", + " pipeline_run_id as `Pipeline Run Id`,\n", + " ee.pipeline_run_link as `Pipeline Run`,\n", + " table_name as `Target Table`,\n", + " event_timestamp as `Event Time`,\n", + " ifnull(num_expectation_dropped_records, 0) as `Expectations Dropped Rows`\n", + "FROM events_table_metrics AS ee\n", + " LEFT JOIN monitored_pipelines USING (pipeline_id)\n", + "WHERE \n", + " (:pipeline_id = 'All' or pipeline_id = :pipeline_id)\n", + " AND (event_timestamp >= :event_range.min AND event_timestamp <= :event_range.max)\n", + " AND (:table_name = 'All' OR table_name = :table_name)\n", + " AND table_name is NOT NULL\n", + " AND ((:pipeline_tag_value = '')\n", + " OR (:pipeline_tag_value = '' and (tags_array IS NULL or array_size(tags_array) = 0))\n", + " OR (array_contains(tags_array, :pipeline_tag_value)))\n", + "ORDER BY event_timestamp DESC" + ], + "parameters": [ + { + "displayName": "pipeline_id", + "keyword": "pipeline_id", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "event_range", + "keyword": "event_range", + "dataType": "DATETIME", + "complexType": "RANGE", + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-7d/d" + }, + "max": { + "value": "now/d" + } + } + } + }, + { + "displayName": "table_name", + "keyword": "table_name", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "All" + } + ] + } + } + }, + { + "displayName": "pipeline_tag_value", + "keyword": "pipeline_tag_value", + "dataType": "STRING", + "defaultSelection": { + "values": { + "dataType": "STRING", + "values": [ + { + "value": "" + } + ] + } + } + } + ], + "catalog": "chavdarbotev_std", + "schema": "mydev_chavdar_botev_dbx_sdp_monitoring" + } + ], + "pages": [ + { + "name": "cd63ec52", + "displayName": "Filtered", + "layout": [ + { + "widget": { + "name": "df5934b3", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "0ac0bb93", + "fields": [ + { + "name": "Pipeline Id", + "expression": "`Pipeline Id`" + }, + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "Pipeline Type", + "expression": "`Pipeline Type`" + }, + { + "name": "Pipeline Tags", + "expression": "`Pipeline Tags`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Id", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100000, + "title": "Pipeline Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Name", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100001, + "title": "Pipeline Name", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 400 + }, + { + "fieldName": "Pipeline Type", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100002, + "title": "Pipeline Type", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Tags", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 100003, + "title": "Pipeline Tags", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Selected Pipelines", + "description": "Pipelines selected using the global filters" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 6, + "height": 5 + } + }, + { + "widget": { + "name": "665329d1", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "b18ee129", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100000, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 400 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100001, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Selected Tables", + "description": "Tables selected using the global filters" + } + } + }, + "position": { + "x": 0, + "y": 7, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "1fcb6a9c", + "multilineTextboxSpec": { + "lines": [ + "***Pipelines, pipeline runs and tables selected using the Global filters***" + ] + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "5693a89a", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Pipeline run", + "expression": "`Pipeline run`" + }, + { + "name": "Status", + "expression": "`Status`" + }, + { + "name": "Create time", + "expression": "`Create time`" + }, + { + "name": "End time", + "expression": "`End time`" + }, + { + "name": "Latest error log message", + "expression": "`Latest error log message`" + }, + { + "name": "Latest error message", + "expression": "`Latest error message`" + }, + { + "name": "Total seconds", + "expression": "`Total seconds`" + }, + { + "name": "Starting seconds", + "expression": "`Starting seconds`" + }, + { + "name": "Initialization seconds", + "expression": "`Initialization seconds`" + }, + { + "name": "Running seconds", + "expression": "`Running seconds`" + }, + { + "name": "Latest full error", + "expression": "`Latest full error`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 0, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Status", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Status", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 150 + }, + { + "fieldName": "Create time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 3, + "title": "Create time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "End time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 4, + "title": "End time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest error log message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Latest error log message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest error message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 6, + "title": "Latest error message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Total seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 7, + "title": "Total seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Starting seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 8, + "title": "Starting seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Initialization seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 10, + "title": "Initialization seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Running seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 11, + "title": "Running seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest full error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 12, + "title": "Latest full error", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 100001, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Status String", + "type": "string", + "displayAs": "string", + "order": 100005, + "title": "Status String", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Latest error code", + "type": "string", + "displayAs": "string", + "order": 100009, + "title": "Latest error code", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Selected pipeline runs", + "description": "Pipeline runs selected using the global filters" + } + } + }, + "position": { + "x": 0, + "y": 13, + "width": 6, + "height": 9 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "a44ce042", + "displayName": "Pipelines Health", + "layout": [ + { + "widget": { + "name": "dd476651", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "c91619f8", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Observability data freshness", + "expression": "`Observability data freshness`" + }, + { + "name": "Check time", + "expression": "`Check time`" + }, + { + "name": "Latency at check time", + "expression": "`Latency at check time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100000, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Observability data freshness", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 100001, + "title": "Observability data freshness", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Check time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 100002, + "title": "Check time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latency at check time", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100003, + "title": "Latency at check time", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Observability Data Freshness", + "description": "Time and latency of the latest processed event by pipeline (oldest to newest)" + } + } + }, + "position": { + "x": 0, + "y": 18, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "6820f9d5", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "82c4e3b5", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Error Time", + "expression": "`Error Time`" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + }, + { + "name": "Error Message in Log", + "expression": "`Error Message in Log`" + }, + { + "name": "Error Message", + "expression": "`Error Message`" + }, + { + "name": "Error Details", + "expression": "`Error Details`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 6, + "title": "Error Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 7, + "title": "Error Code", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message in Log", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 8, + "title": "Error Message in Log", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 9, + "title": "Error Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Details", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 10, + "title": "Error Details", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Id", + "type": "string", + "displayAs": "string", + "order": 2, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 5, + "title": "Pipeline Run Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Pipelines Errors", + "description": "Detected errors in the pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 6, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "9ac0f134", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "c0688ffc", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run Id", + "expression": "`Pipeline Run Id`" + }, + { + "name": "Warning Time", + "expression": "`Warning Time`" + }, + { + "name": "Warning Message", + "expression": "`Warning Message`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100001, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run Id", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100002, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Warning Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 100005, + "title": "Warning Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Warning Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 100006, + "title": "Warning Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 100000, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run", + "type": "string", + "displayAs": "string", + "order": 100003, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 100004, + "title": "Pipeline Run Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Pipelines Warnings", + "description": "Detected warnings in the pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 12, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "5527b0d7", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "62d03d54", + "fields": [ + { + "name": "Pipeline", + "expression": "`Pipeline`" + }, + { + "name": "Latest Pipeline Run", + "expression": "`Latest Pipeline Run`" + }, + { + "name": "Latest Status", + "expression": "`Latest Status`" + }, + { + "name": "Latest error code", + "expression": "`Latest error code`" + }, + { + "name": "Latest Create time", + "expression": "`Latest Create time`" + }, + { + "name": "Latest End time", + "expression": "`Latest End time`" + }, + { + "name": "Latest run errors", + "expression": "`Latest run errors`" + }, + { + "name": "Latest run warnings", + "expression": "`Latest run warnings`" + }, + { + "name": "Latest error message", + "expression": "`Latest error message`" + }, + { + "name": "Latest Total Seconds", + "expression": "`Latest Total Seconds`" + }, + { + "name": "Last Updated", + "expression": "`Last Updated`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 0, + "title": "Pipeline", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Latest Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Latest Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Latest Status", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Latest Status", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 150 + }, + { + "fieldName": "Latest error code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Latest error code", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Create time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 5, + "title": "Latest Create time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest End time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 6, + "title": "Latest End time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest run errors", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 9, + "title": "Latest run errors", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest run warnings", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 10, + "title": "Latest run warnings", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest error message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 11, + "title": "Latest error message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Total Seconds", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 12, + "title": "Latest Total Seconds", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Last Updated", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 13, + "title": "Last Updated", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 1, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Latest Status String", + "type": "string", + "displayAs": "string", + "order": 7, + "title": "Latest Status String", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Latest error log message", + "type": "string", + "displayAs": "string", + "order": 8, + "title": "Latest error log message", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Latest Runs by Pipeline ", + "description": "The status of the latest run for each pipeline" + } + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 6 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "0f6e96ec", + "displayName": "Pipelines Metrics", + "layout": [ + { + "widget": { + "name": "6e698544", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2faba057", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "hourly(Hour)", + "expression": "DATE_TRUNC(\"HOUR\", `Hour`)" + }, + { + "name": "sum(Number of Errors)", + "expression": "SUM(`Number of Errors`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "line", + "encodings": { + "x": { + "fieldName": "hourly(Hour)", + "scale": { + "type": "temporal" + }, + "displayName": "Hour" + }, + "y": { + "fieldName": "sum(Number of Errors)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + }, + "displayName": "Pipeline Name" + }, + "label": { + "show": false + } + }, + "annotations": [], + "frame": { + "showDescription": true, + "showTitle": true, + "title": "Hourly error rate", + "description": "The number of errors per hour across selected pipelines" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "c972afff", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "82c4e3b5", + "fields": [ + { + "name": "count(*)", + "expression": "COUNT(`*`)" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "pie", + "encodings": { + "angle": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Error Code", + "scale": { + "type": "categorical" + } + }, + "label": { + "show": true + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Distribution of errors codes", + "description": "Relative share of different types of errors across pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 8, + "width": 6, + "height": 7 + } + }, + { + "widget": { + "name": "dd54ef0b", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Status String", + "expression": "`Status String`" + }, + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Total seconds)", + "expression": "SUM(`Total seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Total seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Status String", + "scale": { + "type": "categorical", + "mappings": [ + { + "value": "COMPLETED", + "color": { + "themeColorType": "visualizationColors", + "position": 3 + } + }, + { + "value": "FAILED", + "color": { + "themeColorType": "visualizationColors", + "position": 4 + } + }, + { + "value": "WAITING_FOR_RESOURCES", + "color": { + "themeColorType": "visualizationColors", + "position": 5 + } + } + ] + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Total time of pipeline runs", + "description": "Total execution time and final status for recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 25, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "c1f6e5a8", + "multilineTextboxSpec": { + "lines": [ + "# Errors" + ] + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "f7a042ed", + "multilineTextboxSpec": { + "lines": [ + "# Pipeline Durations" + ] + } + }, + "position": { + "x": 0, + "y": 23, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "6b06b493", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Starting seconds)", + "expression": "SUM(`Starting seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Starting seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Starting time of pipeline runs", + "description": "Captures the time waiting for resources to start the pipeline" + } + } + }, + "position": { + "x": 0, + "y": 31, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "77547bb3", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Initialization seconds)", + "expression": "SUM(`Initialization seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Initialization seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Initialization time of pipeline runs", + "description": "Captures the time intializing the pipeline" + } + } + }, + "position": { + "x": 0, + "y": 37, + "width": 6, + "height": 7 + } + }, + { + "widget": { + "name": "530d0177", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "19d796af", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "minutely(Create time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Create time`)" + }, + { + "name": "sum(Running seconds)", + "expression": "SUM(`Running seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Create time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Running seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Pipeline Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Execution time of pipeline runs", + "description": "Captures the time evaluating the data flow graph of the pipeline" + } + } + }, + "position": { + "x": 0, + "y": 44, + "width": 6, + "height": 7 + } + }, + { + "widget": { + "name": "945b778a", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2660d018", + "fields": [ + { + "name": "Pipeline Name", + "expression": "`Pipeline Name`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Total Changes)", + "expression": "SUM(`Total Changes`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "line", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Total Changes)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Pipeline Name", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Traffic", + "description": "Number of changed per hour across all tables" + } + } + }, + "position": { + "x": 0, + "y": 17, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "bc0b65a1", + "multilineTextboxSpec": { + "lines": [ + "# Throughput and Rates" + ] + } + }, + "position": { + "x": 0, + "y": 15, + "width": 6, + "height": 2 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "6701599f", + "displayName": "Tables Health", + "layout": [ + { + "widget": { + "name": "057ead9b", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "248b74c6", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Latest Run", + "expression": "`Latest Run`" + }, + { + "name": "Latest State", + "expression": "`Latest State`" + }, + { + "name": "Latest Changes Time", + "expression": "`Latest Changes Time`" + }, + { + "name": "Seconds Since Latest Change", + "expression": "`Seconds Since Latest Change`" + }, + { + "name": "Latest Schema", + "expression": "`Latest Schema`" + }, + { + "name": "Time of Latest Error", + "expression": "`Time of Latest Error`" + }, + { + "name": "Code of Latest Error", + "expression": "`Code of Latest Error`" + }, + { + "name": "Message of Latest Error", + "expression": "`Message of Latest Error`" + }, + { + "name": "Log Message of Latest Error", + "expression": "`Log Message of Latest Error`" + }, + { + "name": "Latest Pipeline Run with Error", + "expression": "`Latest Pipeline Run with Error`" + }, + { + "name": "Last Updated", + "expression": "`Last Updated`" + }, + { + "name": "Seconds Since Latest Status Update", + "expression": "`Seconds Since Latest Status Update`" + }, + { + "name": "Last Data Refresh Time", + "expression": "`Last Data Refresh Time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Latest Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 300 + }, + { + "fieldName": "Latest State", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Latest State", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 125 + }, + { + "fieldName": "Latest Changes Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 5, + "title": "Latest Changes Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Seconds Since Latest Change", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 6, + "title": "Seconds Since Latest Change", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Schema", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 7, + "title": "Latest Schema", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Time of Latest Error", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 8, + "title": "Time of Latest Error", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Code of Latest Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 9, + "title": "Code of Latest Error", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Message of Latest Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 10, + "title": "Message of Latest Error", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Log Message of Latest Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 11, + "title": "Log Message of Latest Error", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Pipeline Run with Error", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 12, + "title": "Latest Pipeline Run with Error", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 325 + }, + { + "fieldName": "Last Updated", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 13, + "title": "Last Updated", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Seconds Since Latest Status Update", + "numberFormat": "0", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "integer", + "displayAs": "number", + "visible": true, + "order": 14, + "title": "Seconds Since Latest Status Update", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Last Data Refresh Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 15, + "title": "Last Data Refresh Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Status", + "description": "Latest Status of Monitored Tables" + } + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 10 + } + }, + { + "widget": { + "name": "cc2c8376", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "7103e200", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Table State", + "expression": "`Table State`" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + }, + { + "name": "Error Message", + "expression": "`Error Message`" + }, + { + "name": "Error Message in Log", + "expression": "`Error Message in Log`" + }, + { + "name": "Error Time", + "expression": "`Error Time`" + }, + { + "name": "Latest Schema", + "expression": "`Latest Schema`" + }, + { + "name": "Pipeline Run Create Time", + "expression": "`Pipeline Run Create Time`" + }, + { + "name": "Pipeline Run End Time", + "expression": "`Pipeline Run End Time`" + }, + { + "name": "Pipeline Run State", + "expression": "`Pipeline Run State`" + }, + { + "name": "Last Updated", + "expression": "`Last Updated`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 2, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "description": "", + "defaultColumnWidth": 325 + }, + { + "fieldName": "Table State", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Table State", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 125 + }, + { + "fieldName": "Error Code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 6, + "title": "Error Code", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 7, + "title": "Error Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message in Log", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 8, + "title": "Error Message in Log", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 9, + "title": "Error Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Latest Schema", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 10, + "title": "Latest Schema", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run Create Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 11, + "title": "Pipeline Run Create Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run End Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 12, + "title": "Pipeline Run End Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Pipeline Run State", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 13, + "title": "Pipeline Run State", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 150 + }, + { + "fieldName": "Last Updated", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 14, + "title": "Last Updated", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Tables Status per Pipeline Run", + "description": "Table statuses for recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 40, + "width": 6, + "height": 9 + } + }, + { + "widget": { + "name": "0ec2e19e", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "3b8b7d86", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + }, + { + "name": "Error Message", + "expression": "`Error Message`" + }, + { + "name": "Error Log Message", + "expression": "`Error Log Message`" + }, + { + "name": "Error Details", + "expression": "`Error Details`" + }, + { + "name": "Error Time", + "expression": "`Error Time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 325 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Code", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Error Code", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 6, + "title": "Error Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Log Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 7, + "title": "Error Log Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Details", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "complex", + "displayAs": "json", + "visible": true, + "order": 8, + "title": "Error Details", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Error Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 9, + "title": "Error Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Id", + "type": "string", + "displayAs": "string", + "order": 2, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 10, + "title": "Pipeline Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Errors", + "description": "Table error details in recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 10, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "be0d5fae", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "8897fc81", + "fields": [ + { + "name": "Pipeline Link", + "expression": "`Pipeline Link`" + }, + { + "name": "Pipeline Run", + "expression": "`Pipeline Run`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Warning Log Message", + "expression": "`Warning Log Message`" + }, + { + "name": "Warning Time", + "expression": "`Warning Time`" + } + ], + "disaggregated": true + } + } + ], + "spec": { + "version": 1, + "widgetType": "table", + "encodings": { + "columns": [ + { + "fieldName": "Pipeline Link", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 1, + "title": "Pipeline Link", + "allowSearch": true, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 350 + }, + { + "fieldName": "Pipeline Run", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 3, + "title": "Pipeline Run", + "allowSearch": false, + "alignContent": "left", + "allowHTML": true, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false, + "defaultColumnWidth": 325 + }, + { + "fieldName": "Target Table", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 4, + "title": "Target Table", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Warning Log Message", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "string", + "displayAs": "string", + "visible": true, + "order": 5, + "title": "Warning Log Message", + "allowSearch": true, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "fieldName": "Warning Time", + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "type": "datetime", + "displayAs": "datetime", + "visible": true, + "order": 6, + "title": "Warning Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ] + }, + "invisibleColumns": [ + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Name", + "type": "string", + "displayAs": "string", + "order": 0, + "title": "Pipeline Name", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Run Id", + "type": "string", + "displayAs": "string", + "order": 2, + "title": "Pipeline Run Id", + "allowSearch": false, + "alignContent": "left", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + }, + { + "dateTimeFormat": "DD/MM/YYYY HH:mm:ss.SSS", + "booleanValues": [ + "false", + "true" + ], + "imageUrlTemplate": "{{ @ }}", + "imageTitleTemplate": "{{ @ }}", + "imageWidth": "", + "imageHeight": "", + "linkUrlTemplate": "{{ @ }}", + "linkTextTemplate": "{{ @ }}", + "linkTitleTemplate": "{{ @ }}", + "linkOpenInNewTab": true, + "name": "Pipeline Start Time", + "type": "datetime", + "displayAs": "datetime", + "order": 7, + "title": "Pipeline Start Time", + "allowSearch": false, + "alignContent": "right", + "allowHTML": false, + "highlightLinks": false, + "useMonospaceFont": false, + "preserveWhitespace": false + } + ], + "allowHTMLByDefault": false, + "itemsPerPage": 25, + "paginationSize": "default", + "condensed": true, + "withRowNumber": false, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Warnings", + "description": "Table warnings details in recent pipeline runs" + } + } + }, + "position": { + "x": 0, + "y": 34, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "expectation-failures", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "f786fd7e", + "fields": [ + { + "name": "secondly(Check Time)", + "expression": "DATE_TRUNC(\"SECOND\", `Check Time`)" + }, + { + "name": "sum(Rows Failed)", + "expression": "SUM(`Rows Failed`)" + }, + { + "name": "max(Failure Percentage)", + "expression": "MAX(`Failure Percentage`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 1, + "widgetType": "combo", + "encodings": { + "x": { + "fieldName": "secondly(Check Time)", + "scale": { + "type": "categorical" + } + }, + "label": { + "show": false + }, + "y": { + "primary": { + "fields": [ + { + "fieldName": "sum(Rows Failed)" + } + ], + "scale": { + "type": "quantitative" + } + }, + "secondary": { + "fields": [ + { + "fieldName": "max(Failure Percentage)" + } + ], + "scale": { + "type": "quantitative" + } + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Expectation Failures", + "description": "Expectation failures across all selected tables and expectations" + } + } + }, + "position": { + "x": 0, + "y": 16, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "break-down-by-expectation", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "f786fd7e", + "fields": [ + { + "name": "sum(Rows Failed)", + "expression": "SUM(`Rows Failed`)" + }, + { + "name": "Expectation Name", + "expression": "`Expectation Name`" + }, + { + "name": "sum(Rows Passed)", + "expression": "SUM(`Rows Passed`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "pie", + "encodings": { + "angle": { + "fieldName": "sum(Rows Failed)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Expectation Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "sum(Rows Passed)" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Expectation failure break down by expectation", + "description": "The share of failed rows by expectation name" + } + } + }, + "position": { + "x": 0, + "y": 22, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "803d2c81", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "f786fd7e", + "fields": [ + { + "name": "sum(Rows Failed)", + "expression": "SUM(`Rows Failed`)" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "sum(Rows Passed)", + "expression": "SUM(`Rows Passed`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "pie", + "encodings": { + "angle": { + "fieldName": "sum(Rows Failed)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "sum(Rows Passed)" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Expectation failure break down by table", + "description": "The share of failed rows by table" + } + } + }, + "position": { + "x": 0, + "y": 28, + "width": 6, + "height": 6 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + }, + { + "name": "f0c5d746", + "displayName": "Global Filters", + "layout": [ + { + "widget": { + "name": "5315d004", + "queries": [ + { + "name": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c4fbf8dbce3c80c3104_pipeline_id", + "query": { + "datasetName": "6740610b", + "fields": [ + { + "name": "pipeline_id", + "expression": "`pipeline_id`" + }, + { + "name": "pipeline_id_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c8abd83949a26602d02_pipeline_id", + "query": { + "datasetName": "0ac0bb93", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cac982b741d5b37f100_pipeline_id", + "query": { + "datasetName": "8db826d7", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cb78329052ef490c91c_pipeline_id", + "query": { + "datasetName": "b18ee129", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c728a1aa93a52ea3ee5_pipeline_id", + "query": { + "datasetName": "5e3eb18f", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cc28199bd02a5cf014e_pipeline_id", + "query": { + "datasetName": "19d796af", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c31b162ca86027b2214_pipeline_id", + "query": { + "datasetName": "c91619f8", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1bbb9db33d0671b7e5c4_pipeline_id", + "query": { + "datasetName": "82c4e3b5", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1ca1afbff3b634a7e223_pipeline_id", + "query": { + "datasetName": "c0688ffc", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cd2b93a998b3c7419d8_pipeline_id", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c599c5f556138790884_pipeline_id", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c669e0bb96854f3e5b5_pipeline_id", + "query": { + "datasetName": "2faba057", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cdf9b2bfde5833021f0_pipeline_id", + "query": { + "datasetName": "62d03d54", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c418d770b5ee1faf6bc_pipeline_id", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c7d8bf5c5e1b0b4198b_pipeline_id", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c94954d994de52a3bc2_pipeline_id", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cec87c918c5462cfd91_pipeline_id", + "query": { + "datasetName": "b72dbc41", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cf9ae98dbea28d3df14_pipeline_id", + "query": { + "datasetName": "e9270b22", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d058384a13825d03ba9_pipeline_id", + "query": { + "datasetName": "f786fd7e", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d129699ed571ce2f3ea_pipeline_id", + "query": { + "datasetName": "bc1ad993", + "parameters": [ + { + "name": "pipeline_id", + "keyword": "pipeline_id" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-single-select", + "encodings": { + "fields": [ + { + "fieldName": "pipeline_id", + "queryName": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c4fbf8dbce3c80c3104_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c8abd83949a26602d02_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cac982b741d5b37f100_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cb78329052ef490c91c_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c728a1aa93a52ea3ee5_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cc28199bd02a5cf014e_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c31b162ca86027b2214_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1bbb9db33d0671b7e5c4_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1ca1afbff3b634a7e223_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cd2b93a998b3c7419d8_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c599c5f556138790884_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c669e0bb96854f3e5b5_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cdf9b2bfde5833021f0_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c418d770b5ee1faf6bc_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c7d8bf5c5e1b0b4198b_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c94954d994de52a3bc2_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cec87c918c5462cfd91_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cf9ae98dbea28d3df14_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d058384a13825d03ba9_pipeline_id" + }, + { + "parameterName": "pipeline_id", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d129699ed571ce2f3ea_pipeline_id" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Pipelines", + "description": "Select a pipeline to observe with selected tags" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 1, + "height": 2 + } + }, + { + "widget": { + "name": "38094f88", + "queries": [ + { + "name": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1087b64f359a100390be_table_name", + "query": { + "datasetName": "8db826d7", + "fields": [ + { + "name": "table_name", + "expression": "`table_name`" + }, + { + "name": "table_name_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1091ba6a3afac7047a0f_table_name", + "query": { + "datasetName": "b18ee129", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10ab970ed31098c78e69_table_name", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1024aea58a5922819fe1_table_name", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b100c8f076556b0de8fb5_table_name", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b105597260198f0433892_table_name", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b106e85f1f1d9342869c3_table_name", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10c4a4c246ef94cd279b_table_name", + "query": { + "datasetName": "b72dbc41", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e926b1b3daa57707d853060ae_table_name", + "query": { + "datasetName": "e9270b22", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b6904a0e12d39620b3d5d0055c4c_table_name", + "query": { + "datasetName": "f786fd7e", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b690d5ab1755bf99123705eecdfe_table_name", + "query": { + "datasetName": "bc1ad993", + "parameters": [ + { + "name": "table_name", + "keyword": "table_name" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-single-select", + "encodings": { + "fields": [ + { + "fieldName": "table_name", + "queryName": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1087b64f359a100390be_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1091ba6a3afac7047a0f_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10ab970ed31098c78e69_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1024aea58a5922819fe1_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b100c8f076556b0de8fb5_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b105597260198f0433892_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b106e85f1f1d9342869c3_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10c4a4c246ef94cd279b_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e926b1b3daa57707d853060ae_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b6904a0e12d39620b3d5d0055c4c_table_name" + }, + { + "parameterName": "table_name", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b690d5ab1755bf99123705eecdfe_table_name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Tables", + "description": "Select a table to observe" + } + } + }, + "position": { + "x": 0, + "y": 4, + "width": 1, + "height": 2 + } + }, + { + "widget": { + "name": "cc47542d", + "queries": [ + { + "name": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1049bab36f0d89f9f773_event_timestamp", + "query": { + "datasetName": "5e3eb18f", + "fields": [ + { + "name": "event_timestamp", + "expression": "`event_timestamp`" + }, + { + "name": "event_timestamp_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b109daa983b86d2ea0e85_event_range", + "query": { + "datasetName": "19d796af", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549a1f9682759c5da45cd926_event_range", + "query": { + "datasetName": "82c4e3b5", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b107b8ce250ca5a5ac326_event_range", + "query": { + "datasetName": "c0688ffc", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10ab970ed31098c78e69_event_range", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1024aea58a5922819fe1_event_range", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b103181194647c214b1aa_event_range", + "query": { + "datasetName": "2faba057", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10b9badc5ee6b91b8d89_event_range", + "query": { + "datasetName": "62d03d54", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b100c8f076556b0de8fb5_event_range", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b105597260198f0433892_event_range", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b106e85f1f1d9342869c3_event_range", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10c4a4c246ef94cd279b_event_range", + "query": { + "datasetName": "b72dbc41", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e926b1b3daa57707d853060ae_event_range", + "query": { + "datasetName": "e9270b22", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b6904a0e12d39620b3d5d0055c4c_event_range", + "query": { + "datasetName": "f786fd7e", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b690d5ab1755bf99123705eecdfe_event_range", + "query": { + "datasetName": "bc1ad993", + "parameters": [ + { + "name": "event_range", + "keyword": "event_range" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-date-range-picker", + "encodings": { + "fields": [ + { + "fieldName": "event_timestamp", + "queryName": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1049bab36f0d89f9f773_event_timestamp" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b109daa983b86d2ea0e85_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549a1f9682759c5da45cd926_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b107b8ce250ca5a5ac326_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10ab970ed31098c78e69_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b1024aea58a5922819fe1_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b103181194647c214b1aa_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10b9badc5ee6b91b8d89_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b100c8f076556b0de8fb5_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b105597260198f0433892_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b106e85f1f1d9342869c3_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e549b10c4a4c246ef94cd279b_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b68e926b1b3daa57707d853060ae_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b6904a0e12d39620b3d5d0055c4c_event_range" + }, + { + "parameterName": "event_range", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0b690d5ab1755bf99123705eecdfe_event_range" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Date/Time Range", + "description": "Select a event log data/time range to observe" + }, + "selection": { + "defaultSelection": { + "range": { + "dataType": "DATETIME", + "min": { + "value": "now-24h/h" + }, + "max": { + "value": "now/h" + } + } + } + } + } + }, + "position": { + "x": 0, + "y": 6, + "width": 1, + "height": 2 + } + }, + { + "widget": { + "name": "pipeline-tags", + "queries": [ + { + "name": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bff6d9d6134890a05e2f11e34007_tag_value", + "query": { + "datasetName": "5ef35e45", + "fields": [ + { + "name": "tag_value", + "expression": "`tag_value`" + }, + { + "name": "tag_value_associativity", + "expression": "COUNT_IF(`associative_filter_predicate_group`)" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c4fbf8dbce3c80c3104_tag_value", + "query": { + "datasetName": "6740610b", + "parameters": [ + { + "name": "tag_value", + "keyword": "tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c8abd83949a26602d02_pipeline_tag_value", + "query": { + "datasetName": "0ac0bb93", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cac982b741d5b37f100_pipeline_tag_value", + "query": { + "datasetName": "8db826d7", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cb78329052ef490c91c_pipeline_tag_value", + "query": { + "datasetName": "b18ee129", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1bbb9db33d0671b7e5c4_pipeline_tag_value", + "query": { + "datasetName": "82c4e3b5", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1ca1afbff3b634a7e223_pipeline_tag_value", + "query": { + "datasetName": "c0688ffc", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cc28199bd02a5cf014e_pipeline_tag_value", + "query": { + "datasetName": "19d796af", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cdf9b2bfde5833021f0_pipeline_tag_value", + "query": { + "datasetName": "62d03d54", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c31b162ca86027b2214_pipeline_tag_value", + "query": { + "datasetName": "c91619f8", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c669e0bb96854f3e5b5_pipeline_tag_value", + "query": { + "datasetName": "2faba057", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c418d770b5ee1faf6bc_pipeline_tag_value", + "query": { + "datasetName": "2660d018", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c599c5f556138790884_pipeline_tag_value", + "query": { + "datasetName": "8897fc81", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c7d8bf5c5e1b0b4198b_pipeline_tag_value", + "query": { + "datasetName": "248b74c6", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c94954d994de52a3bc2_pipeline_tag_value", + "query": { + "datasetName": "7103e200", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cd2b93a998b3c7419d8_pipeline_tag_value", + "query": { + "datasetName": "3b8b7d86", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cec87c918c5462cfd91_pipeline_tag_value", + "query": { + "datasetName": "b72dbc41", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cf9ae98dbea28d3df14_pipeline_tag_value", + "query": { + "datasetName": "e9270b22", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d058384a13825d03ba9_pipeline_tag_value", + "query": { + "datasetName": "f786fd7e", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + }, + { + "name": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d129699ed571ce2f3ea_pipeline_tag_value", + "query": { + "datasetName": "bc1ad993", + "parameters": [ + { + "name": "pipeline_tag_value", + "keyword": "pipeline_tag_value" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 2, + "widgetType": "filter-single-select", + "encodings": { + "fields": [ + { + "fieldName": "tag_value", + "queryName": "dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bff6d9d6134890a05e2f11e34007_tag_value" + }, + { + "parameterName": "tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c4fbf8dbce3c80c3104_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c8abd83949a26602d02_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cac982b741d5b37f100_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cb78329052ef490c91c_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1bbb9db33d0671b7e5c4_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1ca1afbff3b634a7e223_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cc28199bd02a5cf014e_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cdf9b2bfde5833021f0_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c31b162ca86027b2214_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c669e0bb96854f3e5b5_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c418d770b5ee1faf6bc_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c599c5f556138790884_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c7d8bf5c5e1b0b4198b_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1c94954d994de52a3bc2_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cd2b93a998b3c7419d8_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cec87c918c5462cfd91_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1cf9ae98dbea28d3df14_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d058384a13825d03ba9_pipeline_tag_value" + }, + { + "parameterName": "pipeline_tag_value", + "queryName": "parameter_dashboards/01f0b68e549a192ea15560791af33ce6/datasets/01f0bc0f49ad1d129699ed571ce2f3ea_pipeline_tag_value" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Pipeline Tags", + "description": "Filter pipelines by tag values" + } + } + }, + "position": { + "x": 0, + "y": 0, + "width": 1, + "height": 2 + } + } + ], + "pageType": "PAGE_TYPE_GLOBAL_FILTERS" + }, + { + "name": "75df3911", + "displayName": "Tables Metrics", + "layout": [ + { + "widget": { + "name": "db12c0c1", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "3b8b7d86", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Error Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Error Time`)" + }, + { + "name": "count(*)", + "expression": "COUNT(`*`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Error Time)", + "scale": { + "type": "temporal" + }, + "displayName": "Error Time" + }, + "y": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + }, + "displayName": "Count of Records" + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + }, + "displayName": "Target Table" + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Errors", + "description": "Detected errors while processing tables" + } + } + }, + "position": { + "x": 0, + "y": 2, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "841d82eb", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "8897fc81", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Warning Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Warning Time`)" + }, + { + "name": "count(*)", + "expression": "COUNT(`*`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Warning Time)", + "scale": { + "type": "categorical" + }, + "displayName": "Warning Time" + }, + "y": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + }, + "displayName": "Count of Records" + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + }, + "displayName": "Target Table" + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Table Warnings", + "description": "Detected warnings while processing tables" + } + } + }, + "position": { + "x": 0, + "y": 23, + "width": 6, + "height": 5 + } + }, + { + "widget": { + "name": "6c9cec76", + "multilineTextboxSpec": { + "lines": [ + "# Errors and Warnings" + ] + } + }, + "position": { + "x": 0, + "y": 0, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "e968324c", + "multilineTextboxSpec": { + "lines": [ + "# Rates and Throughput" + ] + } + }, + "position": { + "x": 0, + "y": 46, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "edb32903", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2660d018", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Appended Rows)", + "expression": "SUM(`Appended Rows`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Appended Rows)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Output Rows Throughput", + "description": "Number of output rows to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 48, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "c4b2aec5", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "3b8b7d86", + "fields": [ + { + "name": "count(*)", + "expression": "COUNT(`*`)" + }, + { + "name": "Error Code", + "expression": "`Error Code`" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "pie", + "encodings": { + "angle": { + "fieldName": "count(*)", + "scale": { + "type": "quantitative" + }, + "displayName": "Number of occurrences" + }, + "color": { + "fieldName": "Error Code", + "scale": { + "type": "categorical" + } + }, + "label": { + "show": true + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Error Frequency", + "description": "Distribution of errors by error codes" + } + } + }, + "position": { + "x": 0, + "y": 8, + "width": 6, + "height": 8 + } + }, + { + "widget": { + "name": "115fa69e", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2660d018", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Appended Bytes)", + "expression": "SUM(`Appended Bytes`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Appended Bytes)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Appended Bytes Throughput", + "description": "Number of appended Bytes to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 54, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "afdff795", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2660d018", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Upserted Rows)", + "expression": "SUM(`Upserted Rows`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Upserted Rows)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Upserted Rows Throughput", + "description": "Number of Upserted Rows (apply_changes flows) to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 60, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "05a978f3", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "2660d018", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Deleted Rows)", + "expression": "SUM(`Deleted Rows`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Deleted Rows)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Deleted Rows Throughput", + "description": "Number of Deleted Rows (apply_changes flows) to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 66, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "86b20f22", + "multilineTextboxSpec": { + "lines": [ + "# Latencies" + ] + } + }, + "position": { + "x": 0, + "y": 72, + "width": 6, + "height": 2 + } + }, + { + "widget": { + "name": "20b6da4a", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "b72dbc41", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Backlog Rows)", + "expression": "SUM(`Backlog Rows`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Backlog Rows)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Backlogged Rows", + "description": "Number of backlogged rows to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 74, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "2828faa0", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "b72dbc41", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Backlog Bytes)", + "expression": "SUM(`Backlog Bytes`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Backlog Bytes)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Backlogged Bytes", + "description": "Number of backlogged bytes to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 80, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "bbe94522", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "b72dbc41", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "sum(Backlog Files)", + "expression": "SUM(`Backlog Files`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Backlog Files)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Backlogged Files", + "description": "Number of backlogged files to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 86, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "9d57fd5f", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "b72dbc41", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "hourly(Event Time)", + "expression": "DATE_TRUNC(\"HOUR\", `Event Time`)" + }, + { + "name": "max(Backlog Seconds)", + "expression": "MAX(`Backlog Seconds`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "hourly(Event Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "max(Backlog Seconds)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Hourly Backlog Seconds", + "description": "Max of backlog seconds to the Target Tables per Hour" + } + } + }, + "position": { + "x": 0, + "y": 92, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "dropped-rows-due-to-expectations-by-table", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "bc1ad993", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Event Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Event Time`)" + }, + { + "name": "sum(Expectations Dropped Rows)", + "expression": "SUM(`Expectations Dropped Rows`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Event Time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "sum(Expectations Dropped Rows)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + } + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Dropped rows", + "description": "Number of dropped rows due to expectation failures by table" + } + } + }, + "position": { + "x": 0, + "y": 16, + "width": 6, + "height": 7 + } + }, + { + "widget": { + "name": "hourly-expectations-failures-by-expectation", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "e9270b22", + "fields": [ + { + "name": "Expectation Name", + "expression": "`Expectation Name`" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Check Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Check Time`)" + }, + { + "name": "sum(Rows Failed)", + "expression": "SUM(`Rows Failed`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Check Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Rows Failed)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Expectation Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Target Table" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Expectations failures by expectation", + "description": "Number of rows that failed expectations" + } + } + }, + "position": { + "x": 0, + "y": 28, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "548dd03c", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "e9270b22", + "fields": [ + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "Expectation Name", + "expression": "`Expectation Name`" + }, + { + "name": "minutely(Check Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Check Time`)" + }, + { + "name": "sum(Rows Failed)", + "expression": "SUM(`Rows Failed`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Check Time)", + "scale": { + "type": "temporal" + } + }, + "y": { + "fieldName": "sum(Rows Failed)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Target Table", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "Expectation Name" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Expectations failures by table", + "description": "Number of rows that failed expectations" + } + } + }, + "position": { + "x": 0, + "y": 40, + "width": 6, + "height": 6 + } + }, + { + "widget": { + "name": "a8b05c55", + "queries": [ + { + "name": "main_query", + "query": { + "datasetName": "e9270b22", + "fields": [ + { + "name": "Expectation Name", + "expression": "`Expectation Name`" + }, + { + "name": "sum(Rows Failed)", + "expression": "SUM(`Rows Failed`)" + }, + { + "name": "sum(Rows Passed)", + "expression": "SUM(`Rows Passed`)" + }, + { + "name": "Target Table", + "expression": "`Target Table`" + }, + { + "name": "minutely(Check Time)", + "expression": "DATE_TRUNC(\"MINUTE\", `Check Time`)" + }, + { + "name": "avg(Failure Percentage)", + "expression": "AVG(`Failure Percentage`)" + } + ], + "disaggregated": false + } + } + ], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": { + "fieldName": "minutely(Check Time)", + "scale": { + "type": "categorical" + } + }, + "y": { + "fieldName": "avg(Failure Percentage)", + "scale": { + "type": "quantitative" + } + }, + "color": { + "fieldName": "Expectation Name", + "scale": { + "type": "categorical" + } + }, + "extra": [ + { + "fieldName": "sum(Rows Failed)" + }, + { + "fieldName": "sum(Rows Passed)" + }, + { + "fieldName": "Target Table" + } + ] + }, + "frame": { + "showTitle": true, + "showDescription": true, + "title": "Expectations failure percentage by expectation", + "description": "Percentage of rows that failed expectations" + } + } + }, + "position": { + "x": 0, + "y": 34, + "width": 6, + "height": 6 + } + } + ], + "pageType": "PAGE_TYPE_CANVAS" + } + ], + "uiSettings": { + "theme": { + "widgetHeaderAlignment": "ALIGNMENT_UNSPECIFIED" + }, + "applyModeEnabled": false + } +} diff --git a/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/databricks.yml b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/databricks.yml new file mode 100644 index 0000000..3d2dfbb --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/databricks.yml @@ -0,0 +1,80 @@ +# This DAB contains sample jobs, pipelines, and a Dashboard to build an observability solution for +# Spark Declarative Pipelines (SDP). + +bundle: + name: generic_sdp_monitoring_dab + +sync: + paths: + - ../lib + - ../jobs + - ../vars + - ../resources + +include: + # Shared variables and resources + - ../vars/common.vars.yml + - ../vars/import_event_logs.vars.yml + - ../vars/pipeline_tags_index.vars.yml + - ../vars/post_deploy.vars.yml + - ../vars/third_party_sink.vars.yml + - ../resources/monitoring_schema.schema.yml + - ../resources/import_event_logs.job.yml + - ../resources/build_pipeline_tags_index.job.yml + - ../resources/post_deploy.job.yml + + # Resources specific to this DAB + - resources/*.yml + +variables: + # See also included shared *.vars.yml files above + + # Monitoring ETL pipeline configuration + directly_monitored_pipeline_ids: + description: > + A comma-separated list of CDC connector pipeline ids to monitor. The pipelines must have their event log configured for a direct write to a + Delta table (see https://docs.databricks.com/api/workspace/pipelines/create#event_log). If not, use the `imported_pipeline_ids` variable. + default: "" + directly_monitored_pipeline_tags: + description: > + A semicolon-separated list of comma-separated tag[:value] pairs to filter pipelines for direct monitoring. + Format: "tag1[:value1],tag2[:value2];tag3[:value3]" + - Semicolons (;) separate tag groups (OR logic between groups) + - Commas (,) separate tags within a group (ALL must match - AND logic) + - 'tag' is shorthand for 'tag:' (tag with empty value) + Example: "tier:T0;team:data,tier:T1" means (tier:T0) OR (team:data AND tier:T1) + This is an alternative to specifying pipeline IDs explicitly via `directly_monitored_pipeline_ids`. + If both are specified, pipelines matching either criteria will be included. + default: "" + imported_event_log_tables: + description: > + A comma-separated list of target tables for imported event logs. The format of those tables must be the same as the + event log format though each table may contain events from multiple event logs. Typically, these tables are generated using the `import_event_logs` + job(s). + default: "" + serverless_monitoring_pipeline_enabled: + description: Controls whether the monitoring ETL pipeline should be run on serverless compute. + default: true + monitoring_etl_schedule_state: + description: Enable (`UNPAUSED`) or disable (`PAUSED`) the periodic ETL of observability data + default: UNPAUSED + monitoring_etl_cron_schedule: + description: > + The cron schedule (see http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) to use for updating the observability + tables. Note that you also have to set `monitoring_etl_schedule_state` to `UNPAUSED` for this to take effect. The default is to run the + import hourly. + default: "0 30 0/1 * * ?" + +targets: +# dev: +# default: true +# mode: development +# variables: +# Configure the target monitoring catalog and schema. See variables in /vars/common.vars.yml +# Configure imports of pipeline event logs not stored in a Delta table. See variables in /vars/import_event_logs.vars.yml +# Configure monitoring ETL. See variables above +# Dashboard configuration +# main_dashboard_template_path: ../generic_sdp_monitoring_dab/dashboards/SDP Monitoring Dashboard Template.lvdash.json # Required +# main_dashboard_name: "Generic SDP Dashboard" # customize to any name +# Configure 3P observability integration (if desired). See variables in /vars/third_party_sink.vars.yml + diff --git a/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/monitoring_etl/sdp_monitoring_pipeline_main.py b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/monitoring_etl/sdp_monitoring_pipeline_main.py new file mode 100644 index 0000000..b634032 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/monitoring_etl/sdp_monitoring_pipeline_main.py @@ -0,0 +1,21 @@ +import dlt +import sys +import logging + +sys.path.append("../../lib") + +from dbx_ingestion_monitoring.common_ldp import * + +# Configure logging +logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' +) +logger = logging.getLogger(__name__) +logger.info("Starting Generic SDP Monitoring ETL Pipeline") + +# Pipeline parameters + +conf = Configuration(spark.conf) +pipeline = MonitoringEtlPipeline(conf, spark) +pipeline.register_base_tables_and_views(spark) \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/resources/monitoring_etl.pipeline.yml b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/resources/monitoring_etl.pipeline.yml new file mode 100644 index 0000000..db02cf9 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/resources/monitoring_etl.pipeline.yml @@ -0,0 +1,61 @@ +# This pipeline is responsible for the ETL of the data from the monitored sources and +# generating the observability tables + +resources: + pipelines: + cdc_connector_monitoring_etl: + name: "Monitoring ETL for SDP Pipelines" + libraries: + - glob: + include: ../monitoring_etl/** + - glob: + include: ../../third_party_sinks/** + serverless: ${var.serverless_monitoring_pipeline_enabled} + #photon: true + #channel: PREVIEW + development: true + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + root_path: ${workspace.file_path}/generic_sdp_monitoring_dab/monitoring_etl + event_log: + catalog: ${var.monitoring_catalog} + schema: ${resources.schemas.monitoring_schema.name} + name: generic_sdp_monitoring_etl_event_log + configuration: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + directly_monitored_pipeline_ids: ${var.directly_monitored_pipeline_ids} + directly_monitored_pipeline_tags: ${var.directly_monitored_pipeline_tags} + imported_event_log_tables: ${var.imported_event_log_tables} + pipeline_tags_index_table_name: ${var.pipeline_tags_index_table_name} + pipeline_tags_index_enabled: ${var.pipeline_tags_index_enabled} + pipeline_tags_index_max_age_hours: ${var.pipeline_tags_index_max_age_hours} + pipeline_tags_index_api_fallback_enabled: ${var.pipeline_tags_index_api_fallback_enabled} + + # Third-party monitoring configuration + destination: ${var.third_party_destination} + host_name: ${var.third_party_host_name} + secrets_scope: ${var.third_party_secrets_scope} + endpoints.metrics: ${var.third_party_endpoints_metrics} + endpoints.logs: ${var.third_party_endpoints_logs} + endpoints.events: ${var.third_party_endpoints_events} + num_rows_per_batch: ${var.third_party_batch_size} + max_retry_duration_sec: ${var.third_party_max_retry_duration_sec} + request_timeout_sec: ${var.third_party_request_timeout_sec} + + # Datadog/New Relic API key (stored in secrets) + api_key: ${var.third_party_api_key} + + # Azure Monitor specific configuration + azure_client_id: ${var.azure_client_id} + azure_client_secret: ${var.azure_client_secret} + azure_tenant_id: ${var.azure_tenant_id} + azure_dcr_immutable_id: ${var.azure_dcr_immutable_id} + azure_authorization_endpoint: ${var.azure_authorization_endpoint} + azure_max_access_token_staleness: ${var.azure_max_access_token_staleness} + + # Splunk Observability specific configuration + splunk_access_token: ${var.splunk_access_token} + + # New Relic specific configuration + account_id: ${var.third_party_account_id} \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/resources/monitoring_etl_scheduled_runner.job.yml b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/resources/monitoring_etl_scheduled_runner.job.yml new file mode 100644 index 0000000..14e4b8f --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/generic_sdp_monitoring_dab/resources/monitoring_etl_scheduled_runner.job.yml @@ -0,0 +1,16 @@ +# This job is responsible for running the monitoring ETL + +resources: + jobs: + monitoring_etl_scheduled_runner: + name: "Scheduled runner for the Monitoring ETL Pipeline for SDP Pipelines" + schedule: + pause_status: ${var.monitoring_etl_schedule_state} + quartz_cron_expression: ${var.monitoring_etl_cron_schedule} + timezone_id: UTC + email_notifications: + on_failure: ${var.notification_emails} + tasks: + - task_key: etl_runner + pipeline_task: + pipeline_id: ${resources.pipelines.cdc_connector_monitoring_etl.id} diff --git a/contrib/dbx_ingestion_monitoring/jobs/build_pipeline_tags_index.ipynb b/contrib/dbx_ingestion_monitoring/jobs/build_pipeline_tags_index.ipynb new file mode 100644 index 0000000..12af364 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/jobs/build_pipeline_tags_index.ipynb @@ -0,0 +1,60 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Overview\n", + "\n", + "This notebook builds an inverted index that maps pipeline tags to pipeline IDs. The index is stored in a Delta table and enables efficient discovery of pipelines by tags without having to query the Databricks API for every pipeline.\n", + "\n", + "# Parameters\n", + "- `monitoring_catalog` - the catalog for the index table\n", + "- `monitoring_schema` - the schema for the index table\n", + "- `pipeline_tags_index_table_name` - the name of the index table" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import sys\n", + "\n", + "sys.path.append(\"../lib\")\n", + "\n", + "from dbx_ingestion_monitoring.common import PipelineTagsIndexBuilder\n", + "\n", + "dbutils.widgets.text(\"monitoring_catalog\", \"\")\n", + "dbutils.widgets.text(\"monitoring_schema\", \"\")\n", + "dbutils.widgets.text(\"pipeline_tags_index_table_name\", \"pipeline_tags_index\")\n", + "\n", + "logging.basicConfig(level=logging.INFO, format=\"%(asctime)s [%(levelname)s] (%(name)s) %(message)s\")\n", + "\n", + "# Build the index\n", + "builder = PipelineTagsIndexBuilder(\n", + " monitoring_catalog=dbutils.widgets.get(\"monitoring_catalog\"),\n", + " monitoring_schema=dbutils.widgets.get(\"monitoring_schema\"),\n", + " index_table_name=dbutils.widgets.get(\"pipeline_tags_index_table_name\")\n", + ")\n", + "\n", + "builder.build_index(spark)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.8.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/contrib/dbx_ingestion_monitoring/jobs/create_imported_event_logs_target_table.ipynb b/contrib/dbx_ingestion_monitoring/jobs/create_imported_event_logs_target_table.ipynb new file mode 100644 index 0000000..8e176e7 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/jobs/create_imported_event_logs_target_table.ipynb @@ -0,0 +1,166 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "inputWidgets": {}, + "nuid": "f74d9e5f-4753-47f9-a157-f5c857d1668d", + "showTitle": false, + "tableResultSettingsMap": {}, + "title": "" + } + }, + "source": [ + "# Overview\n", + "\n", + "This notebook will import event logs from SDP pipelines that are not configured to store the event log\n", + "directly into a Delta table (see the [`event_log` option](https://docs.databricks.com/api/workspace/pipelines/create#event_log) in the\n", + "Pipelines API). This can happen, for example, if the pipeline was created prior to the introduction of ability to [Publish to Multiple Catalogs and Schemas from a Single DLT/SDP Pipeline](https://www.databricks.com/blog/publish-multiple-catalogs-and-schemas-single-dlt-pipeline).\n", + "\n", + "\n", + "The logs will be imported into the Delta table `{monitoring_catalog}.{monitoring_schema}.{imported_event_logs_table}`. This enables incremental processing of these logs and also access from principals that are not the owners of these pipelines.\n", + "\n", + "Note that the import of such event logs uses a `MERGE` statement (to allow for incremental import) which is a fairly expensive operation. The preferred approach is to configure the event log into a Delta table.\n", + "\n", + "# Parameters\n", + "- `monitoring_catalog` - the catalog for the table with the imported logs.\n", + "- `monitoring_schema` - the schema for the table with the imported logs\n", + "- `imported_event_logs_table_name` - the name for the table with the imported events logs\n", + "- `imported_pipeline_ids` - a comma-separated list of pipelines whose event logs are to be imported" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "showTitle": false + } + }, + "outputs": [], + "source": [ + "import logging\n", + "import sys\n", + "\n", + "sys.path.append(\"../lib\")\n", + "\n", + "from dbx_ingestion_monitoring.common import *\n", + "\n", + "logging.basicConfig(level=logging.INFO, format=\"%(asctime)s [%(levelname)s] (%(name)s) %(message)s\")\n", + "\n", + "dbutils.widgets.text(\"monitoring_catalog\", \"\")\n", + "dbutils.widgets.text(\"monitoring_schema\", \"\")\n", + "dbutils.widgets.text(\"imported_event_logs_table_name\", \"imported_event_logs\")\n", + "\n", + "importer = EventLogImporter(monitoring_catalog = dbutils.widgets.get(\"monitoring_catalog\"),\n", + " monitoring_schema = dbutils.widgets.get(\"monitoring_schema\"),\n", + " imported_event_logs_table = dbutils.widgets.get(\"imported_event_logs_table_name\"))\n", + "importer.create_target_table(spark)" + ] + } + ], + "metadata": { + "application/vnd.databricks.v1+notebook": { + "computePreferences": null, + "dashboards": [], + "environmentMetadata": { + "base_environment": "", + "environment_version": "4" + }, + "inputWidgetPreferences": null, + "language": "python", + "notebookMetadata": { + "pythonIndentUnit": 2 + }, + "notebookName": "create_imported_event_logs_target_table", + "widgets": { + "imported_event_logs_table_name": { + "currentValue": "imported_event_logs", + "nuid": "7c0b6452-269e-40d7-9b6b-b1a15d306971", + "typedWidgetInfo": { + "autoCreated": false, + "defaultValue": "imported_event_logs", + "label": "", + "name": "imported_event_logs_table_name", + "options": { + "validationRegex": null, + "widgetDisplayType": "Text" + }, + "parameterDataType": "String" + }, + "widgetInfo": { + "defaultValue": "imported_event_logs", + "label": "", + "name": "imported_event_logs_table_name", + "options": { + "autoCreated": false, + "validationRegex": null, + "widgetType": "text" + }, + "widgetType": "text" + } + }, + "monitoring_catalog": { + "currentValue": "", + "nuid": "3a03db86-6e55-4ed7-9a4b-49f285d9e47a", + "typedWidgetInfo": { + "autoCreated": false, + "defaultValue": "", + "label": null, + "name": "monitoring_catalog", + "options": { + "validationRegex": null, + "widgetDisplayType": "Text" + }, + "parameterDataType": "String" + }, + "widgetInfo": { + "defaultValue": "", + "label": null, + "name": "monitoring_catalog", + "options": { + "autoCreated": false, + "validationRegex": null, + "widgetType": "text" + }, + "widgetType": "text" + } + }, + "monitoring_schema": { + "currentValue": "", + "nuid": "77699cac-9876-409c-8685-ace9cf322bc6", + "typedWidgetInfo": { + "autoCreated": false, + "defaultValue": "", + "label": null, + "name": "monitoring_schema", + "options": { + "validationRegex": null, + "widgetDisplayType": "Text" + }, + "parameterDataType": "String" + }, + "widgetInfo": { + "defaultValue": "", + "label": null, + "name": "monitoring_schema", + "options": { + "autoCreated": false, + "validationRegex": null, + "widgetType": "text" + }, + "widgetType": "text" + } + } + } + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/contrib/dbx_ingestion_monitoring/jobs/import_event_logs.ipynb b/contrib/dbx_ingestion_monitoring/jobs/import_event_logs.ipynb new file mode 100644 index 0000000..9b5e2f0 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/jobs/import_event_logs.ipynb @@ -0,0 +1,182 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "inputWidgets": {}, + "nuid": "f74d9e5f-4753-47f9-a157-f5c857d1668d", + "showTitle": false, + "tableResultSettingsMap": {}, + "title": "" + } + }, + "source": [ + "# Overview\n", + "\n", + "This notebook will import event logs from SDP pipelines that are not configured to store the event log\n", + "directly into a Delta table (see the [`event_log` option](https://docs.databricks.com/api/workspace/pipelines/create#event_log) in the\n", + "Pipelines API). This can happen, for example, if the pipeline was created prior to the introduction of ability to [Publish to Multiple Catalogs and Schemas from a Single DLT/SDP Pipeline](https://www.databricks.com/blog/publish-multiple-catalogs-and-schemas-single-dlt-pipeline).\n", + "\n", + "\n", + "The logs will be imported into the Delta table `{monitoring_catalog}.{monitoring_schema}.{imported_event_logs_table}`. This enables incremental processing of these logs and also access from principals that are not the owners of these pipelines.\n", + "\n", + "Note that the import of such event logs uses a `MERGE` statement (to allow for incremental import) which is a fairly expensive operation. The preferred approach is to configure the event log into a Delta table.\n", + "\n", + "# Parameters\n", + "- `monitoring_catalog` - the catalog for the table with the imported logs.\n", + "- `monitoring_schema` - the schema for the table with the imported logs\n", + "- `imported_event_logs_table_name` - the name for the table with the imported events logs\n", + "- `imported_pipeline_ids` - a comma-separated list of pipelines whose event logs are to be imported\n", + "- `imported_pipeline_tags` - Semi-colon-separated lists of comma-separated `tag[:value]` pairs. See the documentation for the `imported_pipeline_tags` variable in `vars/import_event_logs.vars.yml` for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "showTitle": false + } + }, + "outputs": [], + "source": [ + "import logging\n", + "import sys\n", + "\n", + "sys.path.append(\"../lib\")\n", + "\n", + "from dbx_ingestion_monitoring.common import *\n", + "\n", + "dbutils.widgets.text(\"monitoring_catalog\", \"\")\n", + "dbutils.widgets.text(\"monitoring_schema\", \"\")\n", + "dbutils.widgets.text(\"imported_event_logs_table_name\", \"imported_event_logs\")\n", + "dbutils.widgets.text(\"imported_pipeline_ids\", \"\")\n", + "dbutils.widgets.text(\"imported_pipeline_tags\", \"\")\n", + "dbutils.widgets.text(\"pipeline_tags_index_table_name\", \"pipeline_tags_index\")\n", + "dbutils.widgets.text(\"pipeline_tags_index_enabled\", \"true\")\n", + "dbutils.widgets.text(\"pipeline_tags_index_max_age_hours\", \"24\")\n", + "dbutils.widgets.text(\"pipeline_tags_index_api_fallback_enabled\", \"true\")\n", + "\n", + "logging.basicConfig(level=logging.INFO, format=\"%(asctime)s [%(levelname)s] (%(name)s) %(message)s\")\n", + "\n", + "importer = EventLogImporter(\n", + " monitoring_catalog=dbutils.widgets.get(\"monitoring_catalog\"),\n", + " monitoring_schema=dbutils.widgets.get(\"monitoring_schema\"),\n", + " imported_event_logs_table=dbutils.widgets.get(\"imported_event_logs_table_name\"),\n", + " index_table_name=dbutils.widgets.get(\"pipeline_tags_index_table_name\"),\n", + " index_enabled=dbutils.widgets.get(\"pipeline_tags_index_enabled\").lower() == \"true\",\n", + " index_max_age_hours=int(dbutils.widgets.get(\"pipeline_tags_index_max_age_hours\")),\n", + " api_fallback_enabled=dbutils.widgets.get(\"pipeline_tags_index_api_fallback_enabled\").lower() == \"true\"\n", + ")\n", + "importer.import_event_logs_for_pipelines_by_ids_and_tags(\n", + " dbutils.widgets.get(\"imported_pipeline_ids\"),\n", + " dbutils.widgets.get(\"imported_pipeline_tags\"),\n", + " spark)" + ] + } + ], + "metadata": { + "application/vnd.databricks.v1+notebook": { + "computePreferences": null, + "dashboards": [], + "environmentMetadata": { + "base_environment": "", + "environment_version": "4" + }, + "inputWidgetPreferences": null, + "language": "python", + "notebookMetadata": { + "pythonIndentUnit": 2 + }, + "notebookName": "import_event_logs", + "widgets": { + "imported_event_logs_table_name": { + "currentValue": "imported_event_logs", + "nuid": "7c0b6452-269e-40d7-9b6b-b1a15d306971", + "typedWidgetInfo": { + "autoCreated": false, + "defaultValue": "imported_event_logs", + "label": "", + "name": "imported_event_logs_table_name", + "options": { + "validationRegex": null, + "widgetDisplayType": "Text" + }, + "parameterDataType": "String" + }, + "widgetInfo": { + "defaultValue": "imported_event_logs", + "label": "", + "name": "imported_event_logs_table_name", + "options": { + "autoCreated": false, + "validationRegex": null, + "widgetType": "text" + }, + "widgetType": "text" + } + }, + "monitoring_catalog": { + "currentValue": "", + "nuid": "3a03db86-6e55-4ed7-9a4b-49f285d9e47a", + "typedWidgetInfo": { + "autoCreated": false, + "defaultValue": "", + "label": null, + "name": "monitoring_catalog", + "options": { + "validationRegex": null, + "widgetDisplayType": "Text" + }, + "parameterDataType": "String" + }, + "widgetInfo": { + "defaultValue": "", + "label": null, + "name": "monitoring_catalog", + "options": { + "autoCreated": false, + "validationRegex": null, + "widgetType": "text" + }, + "widgetType": "text" + } + }, + "monitoring_schema": { + "currentValue": "", + "nuid": "77699cac-9876-409c-8685-ace9cf322bc6", + "typedWidgetInfo": { + "autoCreated": false, + "defaultValue": "", + "label": null, + "name": "monitoring_schema", + "options": { + "validationRegex": null, + "widgetDisplayType": "Text" + }, + "parameterDataType": "String" + }, + "widgetInfo": { + "defaultValue": "", + "label": null, + "name": "monitoring_schema", + "options": { + "autoCreated": false, + "validationRegex": null, + "widgetType": "text" + }, + "widgetType": "text" + } + } + } + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/contrib/dbx_ingestion_monitoring/jobs/publish_dashboard.ipynb b/contrib/dbx_ingestion_monitoring/jobs/publish_dashboard.ipynb new file mode 100644 index 0000000..ac02ee6 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/jobs/publish_dashboard.ipynb @@ -0,0 +1,104 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "inputWidgets": {}, + "nuid": "1ee90055-2c00-4102-8289-bedfe55c90c2", + "showTitle": false, + "tableResultSettingsMap": {}, + "title": "" + } + }, + "source": [ + "# Overview\n", + "\n", + "This notebook is a workaround for limitations in current AI/BI dashboards that provide limitted parameterization. We want to adjust the monitoring tables used in the monitoring dashboard based on the deployment environment. The notebook will modify the dashboard definition and set the default catalog and schema for all datasets in the dashboard, to match the `default_dataset_catalog` and `default_dataset_schema` parameters.\n", + "\n", + "# Notebook Parameters\n", + "- `dashboard_template_path` - (required) the path to the JSON file of the dashboard to use as a template for publishing\n", + "- `dashboard_id` - (optional) the name of the AI/BI dashboard to update if known. If not specified, the notebook will attempt to find a dashboard with the specified `display_name`. If none is found, a new one will be created. If multiple such dashboards aere found, the notebook will fail with an error and an explicit `dashboard_id` must be specified.\n", + "- `published_dashboard_name` - (optional) the display name of the dashboard. If not specified, the name of the file (without the `.lvdash.json` extension and \"Template\") will be used.\n", + "- `default_dataset_catalog` - (optional) the default catalog for datasets to be set\n", + "- `default_dataset_schema` - (optional) the detault schema for datasets to be set\n", + "- `warehouse_id` - (optional) the ID of the warehouse to use for the AI/BI dashboard. If not specified, the first suitable one will be used." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "showTitle": false, + "title": "" + } + }, + "outputs": [], + "source": [ + "dbutils.widgets.text(name=\"dashboard_template_path\", defaultValue=\"\")\n", + "dbutils.widgets.text(name=\"dashboard_id\", defaultValue=\"\")\n", + "dbutils.widgets.text(name=\"published_dashboard_name\", defaultValue=\"\")\n", + "dbutils.widgets.text(name=\"default_dataset_catalog\", defaultValue=\"\")\n", + "dbutils.widgets.text(name=\"default_dataset_schema\", defaultValue=\"\")\n", + "dbutils.widgets.text(name=\"warehouse_id\", defaultValue=\"\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "showTitle": false, + "title": "" + } + }, + "outputs": [], + "source": [ + "import logging\n", + "import sys\n", + "from databricks.sdk import WorkspaceClient\n", + "\n", + "sys.path.append('../lib')\n", + "\n", + "from dbx_ingestion_monitoring.common import *\n", + "\n", + "logging.basicConfig(level=logging.INFO, format=\"%(asctime)s [%(levelname)s] (%(name)s) %(message)s\")\n", + "\n", + "d = DashboardTemplate.from_notebook_widgets(widgets=dbutils.widgets, wc=WorkspaceClient())\n", + "d.publish()" + ] + } + ], + "metadata": { + "application/vnd.databricks.v1+notebook": { + "computePreferences": { + "hardware": { + "accelerator": null, + "gpuPoolId": null, + "memory": null + } + }, + "dashboards": [], + "environmentMetadata": { + "base_environment": "", + "environment_version": "4" + }, + "inputWidgetPreferences": null, + "language": "python", + "notebookMetadata": { + "pythonIndentUnit": 2 + }, + "notebookName": "publish_dashboard", + "widgets": {} + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/contrib/dbx_ingestion_monitoring/jobs/update_monitoring_tables_meta.ipynb b/contrib/dbx_ingestion_monitoring/jobs/update_monitoring_tables_meta.ipynb new file mode 100644 index 0000000..750f233 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/jobs/update_monitoring_tables_meta.ipynb @@ -0,0 +1,72 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "inputWidgets": {}, + "nuid": "34e95409-db35-4e3c-8c74-00a65b532fd1", + "showTitle": false, + "tableResultSettingsMap": {}, + "title": "" + } + }, + "source": [ + "This is a helper notebook to update the column comments for all monitoring tables. This is a workaround for the current limitation in SDP where it is not possible to set the column comments for streaming tables and materialized views." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "application/vnd.databricks.v1+cell": { + "cellMetadata": {}, + "showTitle": false + } + }, + "outputs": [], + "source": [ + "import logging\n", + "import sys\n", + "\n", + "sys.path.append(\"../lib\")\n", + "\n", + "from dbx_ingestion_monitoring.standard_tables import *\n", + "from dbx_ingestion_monitoring.common import *\n", + "\n", + "dbutils.widgets.text(\"monitoring_catalog\", \"\")\n", + "dbutils.widgets.text(\"monitoring_schema\", \"\")\n", + "\n", + "logging.basicConfig(level=logging.INFO, format=\"%(asctime)s [%(levelname)s] (%(name)s) %(message)s\")\n", + "logging.getLogger(\"dbx_ingestion_monitoring.MonitoringTable\").setLevel(logging.DEBUG)\n", + "\n", + "monitoring_catalog = get_required_widget_parameter(dbutils.widgets, 'monitoring_catalog')\n", + "monitoring_schema = get_required_widget_parameter(dbutils.widgets, 'monitoring_schema')\n", + "set_all_table_column_comments(monitoring_catalog, monitoring_schema, spark)\n" + ] + } + ], + "metadata": { + "application/vnd.databricks.v1+notebook": { + "computePreferences": null, + "dashboards": [], + "environmentMetadata": { + "base_environment": "", + "environment_version": "4" + }, + "inputWidgetPreferences": null, + "language": "python", + "notebookMetadata": { + "pythonIndentUnit": 2 + }, + "notebookName": "update_monitoring_tables_meta", + "widgets": {} + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/__init__.py b/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/common.py b/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/common.py new file mode 100644 index 0000000..bd641f6 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/common.py @@ -0,0 +1,674 @@ +""" +Common observability classes and functions. +""" + +import json +import logging +from pyspark.sql import SparkSession +import os +import re +from typing import List, Optional + +from databricks.sdk import WorkspaceClient +from databricks.sdk.service.dashboards import Dashboard +from databricks.sdk.service.sql import State + + +def parse_comma_separated_list(s: Optional[str]) ->List[str]: + """ + Parses a notebook parameter that contains a comma-separated list of items. It strips whitespace and + skips empty items. + :return: The parsed list of items + """ + if s is None: + return [] + return [j for j in [i.strip() for i in s.strip().split(',')] if len(j) > 0] + + +def is_parameter_defined(s: Optional[str]) -> bool: + return s is not None and len(s.strip()) > 0 + +def parse_tag_value_pairs(tags_str: Optional[str]) -> List[List[tuple]]: + """ + Parses a tag filter expression with OR of ANDs semantics. + + Format: Semi-colon-separated groups where each group is comma-separated tag[:value] pairs. + - Semicolons separate groups (OR logic between groups) + - Commas separate tags within a group (AND logic within group) + - 'tag' is shorthand for 'tag:' (tag with empty value) + + :param tags_str: String like "tier:T0;team:data,tier:T1" meaning (tier:T0) OR (team:data AND tier:T1) + :return: List of groups, where each group is a list of tuples like [[("tier", "T0")], [("team", "data"), ("tier", "T1")]] + + Examples: + - "env:prod" -> [[("env", "prod")]] + - "env:prod,tier:T0" -> [[("env", "prod"), ("tier", "T0")]] + - "env:prod;env:staging" -> [[("env", "prod")], [("env", "staging")]] + - "tier:T0;team:data,tier:T1" -> [[("tier", "T0")], [("team", "data"), ("tier", "T1")]] + - "monitoring" -> [[("monitoring", "")]] + """ + if not is_parameter_defined(tags_str): + return [] + + result = [] + # Split by semicolon to get groups (OR logic) + groups = [g.strip() for g in tags_str.strip().split(';') if g.strip()] + + for group in groups: + # Split by comma to get individual tags in this group (AND logic) + tag_pairs = [] + items = [item.strip() for item in group.split(',') if item.strip()] + + for item in items: + if ':' in item: + # tag:value format + parts = item.split(':', 1) + tag_pairs.append((parts[0].strip(), parts[1].strip())) + else: + # tag format (shorthand for tag:) + tag_pairs.append((item.strip(), "")) + + if tag_pairs: + result.append(tag_pairs) + + return result + +def get_pipeline_tags(wc: WorkspaceClient, pipeline_id: str) -> Optional[dict]: + # For now we use the REST API directly as older Python SDK versions may not support tags + pipeline_spec = wc.api_client.do(method='get', path=f'/api/2.0/pipelines/{pipeline_id}')['spec'] + + # Check if pipeline has tags + return pipeline_spec.get('tags') + + +def get_pipeline_ids_by_tags( + wc: WorkspaceClient, + tag_groups: List[List[tuple]], + spark: Optional[SparkSession] = None, + index_table_fqn: Optional[str] = None, + index_enabled: bool = True, + index_max_age_hours: int = 24, + api_fallback_enabled: bool = True, + log: Optional[logging.Logger] = None +) -> List[str]: + """ + Fetches pipeline IDs using OR of ANDs logic for tag matching. + This is a common helper function used by both EventLogImporter and MonitoringEtlPipeline. + + Logic: A pipeline matches if it satisfies ALL tags in ANY group. + - Within a group: ALL tags must match (AND logic) + - Between groups: ANY group can match (OR logic) + + Tries to use the pipeline tags index table first (if enabled and fresh), falls back to API-based discovery if needed. + + :param wc: WorkspaceClient instance + :param tag_groups: List of tag groups, e.g., [[("env", "prod")], [("team", "data"), ("tier", "T1")]] + means (env:prod) OR (team:data AND tier:T1) + :param spark: Optional SparkSession for index table queries + :param index_table_fqn: Fully qualified name of the pipeline tags index table + :param index_enabled: Whether to use the index table + :param index_max_age_hours: Maximum age of index (in hours) before falling back to API + :param api_fallback_enabled: Whether to fall back to API if index is unavailable/stale + :param log: Optional logger for logging + :return: List of pipeline IDs matching the tag filter expression + """ + if not tag_groups: + return [] + + if log is None: + log = logging.getLogger("get_pipeline_ids_by_tags") + + # Try index-based lookup first + if index_enabled and spark is not None and index_table_fqn: + try: + # Check index freshness + try: + freshness_check = spark.sql(f""" + SELECT + MAX(last_updated) as last_updated, + timestampdiff(HOUR, MAX(last_updated), CURRENT_TIMESTAMP()) as age_hours + FROM {index_table_fqn} + """).collect()[0] + age_hours = freshness_check['age_hours'] + except Exception as e: + if log: + log.warning(f"Failed to check pipeline tags index: {e}") + age_hours = None + + if age_hours is None or age_hours <= index_max_age_hours: + # Index is fresh, use it + log.info(f"Using pipeline tags index table (age: {age_hours:.1f} hours)") + + # Step 1: Collect all unique (tag_key, tag_value) pairs across all groups + all_tag_pairs = set() + for group in tag_groups: + all_tag_pairs.update(group) + + # Step 2: Query database once for all tag pairs + where_conditions = " OR ".join([ + f"(tag_key = '{tag_key}' AND tag_value = '{tag_value}')" + for tag_key, tag_value in all_tag_pairs + ]) + + log.info(f"Querying pipeline tags index table {index_table_fqn} with where conditions: {where_conditions}") + + query = f""" + SELECT DISTINCT tag_key, tag_value, explode(pipeline_ids) as pipeline_id + FROM {index_table_fqn} + WHERE {where_conditions} + """ + + result = spark.sql(query).collect() + + # Step 3: Build map from (tag_key, tag_value) -> set of pipeline_ids + tag_to_pipelines = {} + for row in result: + tag_pair = (row['tag_key'], row['tag_value']) + if tag_pair not in tag_to_pipelines: + tag_to_pipelines[tag_pair] = set() + tag_to_pipelines[tag_pair].add(row['pipeline_id']) + + # Step 4: For each group, intersect pipeline_ids (AND logic) + matching_pipeline_ids = set() + for group in tag_groups: + if not group: + continue + + # Get pipeline_ids for each tag in the group + group_pipeline_sets = [] + for tag_pair in group: + if tag_pair in tag_to_pipelines: + group_pipeline_sets.append(tag_to_pipelines[tag_pair]) + else: + # Tag doesn't exist in index, so no pipelines match this group + group_pipeline_sets = [] + break + + # Intersect all sets in this group (AND logic) + if group_pipeline_sets: + group_result = set.intersection(*group_pipeline_sets) + if group_result: + log.info(f"Found {len(group_result)} pipeline(s) matching group {group}") + # Step 5: Union with results from other groups (OR logic) + matching_pipeline_ids.update(group_result) + + return list(matching_pipeline_ids) + else: + # Index is stale + log.warning(f"Pipeline tags index is stale (age: {age_hours:.1f} hours > max: {index_max_age_hours} hours)") + if not api_fallback_enabled: + raise ValueError(f"Index is stale and API fallback is disabled") + except Exception as e: + log.warning(f"Failed to use pipeline tags index: {e}") + if not api_fallback_enabled: + raise + + # Fall back to API-based discovery + log.warning("Falling back to API-based pipeline discovery (this may be slow)") + + matching_pipeline_ids = set() + + # List all pipelines in the workspace (this returns basic info only) + all_pipeline_ids = [(pi.pipeline_id, pi.name) for pi in wc.pipelines.list_pipelines()] + + for pipeline_id, pipeline_name in all_pipeline_ids: + try: + # Fetch the full pipeline spec to get tags + pipeline_tags = get_pipeline_tags(wc, pipeline_id) + + if not pipeline_tags: + continue + + # Check if this pipeline matches any group (OR of ANDs) + for group in tag_groups: + # Check if pipeline has ALL tags in this group + group_matches = True + for tag_key, tag_value in group: + if tag_key not in pipeline_tags or pipeline_tags[tag_key] != tag_value: + group_matches = False + break + + if group_matches: + matching_pipeline_ids.add(pipeline_id) + log.info(f"Pipeline {pipeline_name} ({pipeline_id}) matches group {group}") + break # Pipeline matches at least one group, no need to check other groups + except Exception as e: + log.warning(f"Failed to fetch pipeline {pipeline_id}: {e}") + continue + + return list(matching_pipeline_ids) + + +def get_optional_parameter(value: Optional[str]) -> str: + return value.strip() if is_parameter_defined(value) else None + +def get_required_parameter(name: str, value: Optional[str]) -> str: + if is_parameter_defined(value): + return value.strip() + + raise ValueError(f"Missing required parameter: {name}") + +def get_required_widget_parameter(widgets, param_name: str): + return get_required_parameter(param_name, widgets.get(param_name)) + +SDP_EVENT_LOG_SCHEMA=""" + id STRING, + sequence STRUCT, + control_plane_seq_no: BIGINT>, + origin STRUCT, + timestamp TIMESTAMP, + message STRING, + level STRING, + maturity_level STRING, + error STRUCT>>>>, + details STRING, + event_type STRING + """ + + +class EventLogImporter: + """ + A helper class to incrementally import SDP event logs from pipelines that are not configured to store the event log + directly in a Delta table (see the [`event_log` option](https://docs.databricks.com/api/workspace/pipelines/create#event_log) in the + Pipelines API). This can happen for example, if the pipeline was created prior to the introduction of ability to [Publish to Multiple Catalogs and Schemas from a Single DLT/SDP Pipeline](https://www.databricks.com/blog/publish-multiple-catalogs-and-schemas-single-dlt-pipeline). + + The import is done into a Delta table that can be used to store the logs from multiple pipelines. + + Note that is an expensive operation (it uses `MERGE` statements to achieve incrementalization) and should be used only if + direct write of the event log to a Delta table is not possible. + """ + + def __init__(self, monitoring_catalog: str, monitoring_schema: str, imported_event_logs_table: str, + index_table_name: str = "pipeline_tags_index", + index_enabled: bool = True, + index_max_age_hours: int = 24, + api_fallback_enabled: bool = True, + wc: Optional[WorkspaceClient] = None): + """ + Constructor. + :param monitoring_catalog: The catalog for the table with the imported event logs + :param monitoring_schema: The schema for the table with the imported event logs + :param imported_event_logs_table: The name of the table where the imported event logs are to be stored + :param index_table_name: The name of the pipeline tags index table + :param index_enabled: Whether to use the pipeline tags index + :param index_max_age_hours: Maximum age of the index (in hours) before falling back to API + :param api_fallback_enabled: Whether to fall back to API if index is unavailable/stale + :param wc: The WorkspaceClient to use; if none is specified, a new one will be instantiated + """ + if monitoring_catalog is None or len(monitoring_catalog.strip()) == 0: + raise ValueError("Monitoring catalog cannot be empty") + if monitoring_schema is None or len(monitoring_schema.strip()) == 0: + raise ValueError("Monitoring schema cannot be empty") + if imported_event_logs_table is None or len(imported_event_logs_table) == 0: + raise ValueError("Imported event logs table cannot be empty") + + self.monitoring_catalog = monitoring_catalog.strip() + self.monitoring_schema = monitoring_schema.strip() + self.imported_event_logs_table = imported_event_logs_table.strip() + self.imported_event_logs_table_fqname = f"`{self.monitoring_catalog}`.`{self.monitoring_schema}`.`{self.imported_event_logs_table}`" + self.index_table_fqn = f"`{self.monitoring_catalog}`.`{self.monitoring_schema}`.`{index_table_name}`" + self.index_enabled = index_enabled + self.index_max_age_hours = index_max_age_hours + self.api_fallback_enabled = api_fallback_enabled + self.wc = wc if wc else WorkspaceClient() + self.log = logging.getLogger("EventLogImporter") + + + def create_target_table(self, spark: SparkSession): + """ + Creates the target table where the event logs will be imported if it does not exists. + """ + spark.sql(f"CREATE TABLE IF NOT EXISTS {self.imported_event_logs_table_fqname} ({SDP_EVENT_LOG_SCHEMA}) CLUSTER BY AUTO") + + + def import_event_log_for_one_pipeline(self, pipeline_id: str, spark: SparkSession): + """ + Imports current contents of the event log for the pipeline with the specified `pipeline_id` + """ + self.log.info(f"Merging changes from event log for pipeline {pipeline_id} ...") + merge_res_df = spark.sql(f""" + MERGE INTO {self.imported_event_logs_table_fqname} AS t + USING (SELECT * FROM event_log('{pipeline_id}')) as s + ON t.origin.pipeline_id = s.origin.pipeline_id and t.id = s.id + WHEN NOT MATCHED THEN INSERT * + """) + merge_res_df.show(truncate=False) + latest_event_timestamp = spark.sql(f""" + SELECT max(`timestamp`) + FROM {self.imported_event_logs_table_fqname} + WHERE origin.pipeline_id='{pipeline_id}' """).collect()[0][0] + self.log.info(f"Latest imported event for pipeline {pipeline_id} as of {latest_event_timestamp}") + + + def import_event_logs_for_pipelines(self, pipeline_ids: List[str], spark: SparkSession): + """ + Imports current contents of the event logs for the pipelines in the `pipeline_ids` list + """ + if len(pipeline_ids) == 0: + print("Nothing to import") + else: + for pipeline_id in pipeline_ids: + self.import_event_log_for_one_pipeline(pipeline_id, spark) + + + def import_event_logs_for_pipelines_comma_list(self, pipeline_ids_list: str, spark: SparkSession): + """ + Imports current contents of the event logs for the pipelines in comma-separated list in + `pipeline_ids_list`. This is primarily for use with notebook parameters. + """ + self.import_event_logs_for_pipelines(parse_comma_separated_list(pipeline_ids_list), spark) + + + def import_event_logs_for_pipelines_by_tags(self, tags_str: str, spark: SparkSession): + """ + Imports current contents of the event logs for pipelines matching ANY of the specified tag:value pairs. + :param tags_str: Comma-separated list of tag:value pairs (e.g., "env:prod,team:data") + :param spark: SparkSession instance + """ + tag_value_pairs = parse_tag_value_pairs(tags_str) + if not tag_value_pairs: + self.log.info("No tags specified for pipeline filtering") + return + + self.log.info(f"Fetching pipelines matching tags: {tags_str}") + pipeline_ids = get_pipeline_ids_by_tags(self.wc, tag_value_pairs, self.log) + + if not pipeline_ids: + self.log.warning(f"No pipelines found matching any of the tags: {tags_str}") + else: + self.log.info(f"Found {len(pipeline_ids)} pipeline(s) matching tags") + self.import_event_logs_for_pipelines(pipeline_ids, spark) + + + def import_event_logs_for_pipelines_by_ids_and_tags(self, pipeline_ids_list: str, tags_str: str, spark: SparkSession): + """ + Imports current contents of the event logs for pipelines specified by IDs or matching tags. + Pipelines matching either criteria will be included. + :param pipeline_ids_list: Comma-separated list of pipeline IDs + :param tags_str: Comma-separated list of tag:value pairs + :param spark: SparkSession instance + """ + # Collect pipeline IDs from explicit list + explicit_ids = set(parse_comma_separated_list(pipeline_ids_list)) if pipeline_ids_list else set() + + # Collect pipeline IDs from tags + tag_value_pairs = parse_tag_value_pairs(tags_str) if tags_str else [] + tag_ids = set(get_pipeline_ids_by_tags( + self.wc, + tag_value_pairs, + spark=spark, + index_table_fqn=self.index_table_fqn, + index_enabled=self.index_enabled, + index_max_age_hours=self.index_max_age_hours, + api_fallback_enabled=self.api_fallback_enabled, + log=self.log + )) if tag_value_pairs else set() + + # Combine both sets + all_pipeline_ids = explicit_ids.union(tag_ids) + + if not all_pipeline_ids: + self.log.info("No pipelines specified (neither by ID nor by tags)") + return + + self.log.info(f"Importing event logs for {len(all_pipeline_ids)} pipeline(s)") + self.import_event_logs_for_pipelines(list(all_pipeline_ids), spark) + + +class PipelineTagsIndexBuilder: + """ + A helper class to build an inverted index mapping pipeline tags to pipeline IDs. + The index is stored in a Delta table and enables efficient discovery of pipelines by tags + without having to query the Databricks API for every pipeline. + """ + + def __init__(self, monitoring_catalog: str, monitoring_schema: str, index_table_name: str, wc: Optional[WorkspaceClient] = None): + """ + Constructor. + :param monitoring_catalog: The catalog for the index table + :param monitoring_schema: The schema for the index table + :param index_table_name: The name of the index table + :param wc: The WorkspaceClient to use; if none is specified, a new one will be instantiated + """ + self.monitoring_catalog = monitoring_catalog.strip() + self.monitoring_schema = monitoring_schema.strip() + self.index_table_name = index_table_name.strip() + self.index_table_fqn = f"`{self.monitoring_catalog}`.`{self.monitoring_schema}`.`{self.index_table_name}`" + self.wc = wc if wc else WorkspaceClient() + self.log = logging.getLogger("PipelineTagsIndexBuilder") + + + def build_index(self, spark: SparkSession): + """ + Builds the pipeline tags index and writes it to a Delta table. + The index maps tag:value pairs to lists of pipeline IDs. + """ + from datetime import datetime + from pyspark.sql import Row + + self.log.info(f"Building pipeline tags index in table: {self.index_table_fqn}") + + # List all pipelines + self.log.info("Listing all pipelines...") + all_pipelines_id = [pi.pipeline_id for pi in self.wc.pipelines.list_pipelines()] + self.log.info(f"Found {len(all_pipelines_id)} pipelines") + + # Build inverted index: tag:value -> [pipeline_ids] + tags_index = {} # {(tag_key, tag_value): [pipeline_ids]} + processed_count = 0 + error_count = 0 + + for pipeline_id in all_pipelines_id: + try: + # Check if pipeline has tags + pipeline_tags = get_pipeline_tags(self.wc, pipeline_id) + if pipeline_tags: + # Add to inverted index + for tag_key, tag_value in pipeline_tags.items(): + key = (tag_key, tag_value) + if key not in tags_index: + tags_index[key] = [] + tags_index[key].append(pipeline_id) + + processed_count += 1 + if processed_count % 100 == 0: + self.log.info(f"Processed {processed_count}/{len(all_pipelines_id)} pipelines...") + + except Exception as e: + error_count += 1 + self.log.warning(f"Failed to process pipeline {pipeline_id}: {e}") + continue + + self.log.info(f"Processed {processed_count} pipelines ({error_count} errors)") + self.log.info(f"Found {len(tags_index)} unique tag:value pairs") + + # Convert to DataFrame and write to Delta table + if tags_index: + # Create rows for the DataFrame + rows = [ + Row( + tag_key=tag_key, + tag_value=tag_value, + pipeline_ids=pipeline_ids, + last_updated=datetime.utcnow() + ) + for (tag_key, tag_value), pipeline_ids in tags_index.items() + ] + + # Create DataFrame + df = spark.createDataFrame(rows) + + # Write to Delta table (overwrite to ensure freshness) + self.log.info(f"Writing index to {self.index_table_fqn}...") + df.write \ + .mode("overwrite") \ + .option("overwriteSchema", "true") \ + .saveAsTable(self.index_table_fqn) + + self.log.info(f"Successfully built pipeline tags index with {len(tags_index)} entries") + + else: + self.log.warning("No tags found in any pipelines. Index table will not be created/updated.") + + +class DashboardTemplate: + """ + A helper class to transform the definition of dashboard based on DAB configuration variables. This is a workaround as + currently AI/BI dashboards have limitted parametrization capabilites. + + Currently, the only transformation supported is setting the default catalog and schema for all datasets in the dashboard. + """ + def __init__(self, + dashboard_template_path: str, + dashboard_id: Optional[str] = None, + published_dashboard_name: Optional[str] = None, + default_dataset_catalog: Optional[str] = None, + default_dataset_schema: Optional[str] = None, + warehouse_id: Optional[str] = None, + wc: Optional[WorkspaceClient] = None): + """ + Constructor + :param dashboard_template_path: (required) the path to the `.lvdash.json` file of the dashboard to use as a template for publishing + :param dashboard_id: the name of the AI/BI dashboard to update if known. If not specified, the notebook will attempt to find a + dashboard with the specified `published_dashboard_name`. If none is found, a new one will be created. If + multiple such dashboards aere found, the notebook will fail with an error and an explicit dashboard_id must + be specified. + :param published_dashboard_name: (optional) the display name of the dashboard. If not specified, the name of the file (without + the .lvdash.json extension and "Template") will be used. + :param default_dataset_catalog: (optional) the default catalog for datasets to be set + :param default_dataset_schema: (optional) the detault schema for datasets to be set + :param warehouse_id: (optional) the ID of the warehouse to use for the AI/BI dashboard. If not specified, the first suitable one will be used. + :param wc: the WorkspaceClient to use; if none is specified, a new one will be instantiated + """ + self.log = logging.getLogger("DashboardTemplate") + self.wc = wc if wc else WorkspaceClient() + self.dashboard_template_path = get_required_parameter(name='dashboard_template_path', value=dashboard_template_path) + if not os.path.exists(dashboard_template_path): + raise ValueError(f"Dashboard at path {dashboard_template_path} does not exist") + self.dashboard_id = get_optional_parameter(dashboard_id) + self.published_dashboard_name = published_dashboard_name if is_parameter_defined(published_dashboard_name) else self._extract_dashboard_name_from_path(dashboard_template_path) + self.default_dataset_catalog = get_optional_parameter(default_dataset_catalog) + self.default_dataset_schema = get_optional_parameter(default_dataset_schema) + self.warehouse_id = warehouse_id if is_parameter_defined(warehouse_id) else self._get_default_warehouse_id() + if self.warehouse_id is None: + raise Exception("Unable to find a suitable warehouse for the AI/BI dashboard. Please set `warehouse_id` with the ID of the warehouse to use.") + + @staticmethod + def from_notebook_widgets(widgets, wc: Optional[WorkspaceClient] = None): + return DashboardTemplate(dashboard_template_path=widgets.get("dashboard_template_path"), + dashboard_id=widgets.get("dashboard_id"), + published_dashboard_name=widgets.get("published_dashboard_name"), + default_dataset_catalog=widgets.get("default_dataset_catalog"), + default_dataset_schema=widgets.get("default_dataset_schema"), + warehouse_id=widgets.get("warehouse_id"), + wc=wc) + + + @staticmethod + def _extract_dashboard_name_from_path(dasboard_path: str) -> str: + display_name = os.path.basename(dashboard_path).replace(".lvdash.json", "") + return re.sub(r'\s+Template', '', display_name) + + + def _get_default_warehouse_id(self): + warehouse_name = None + preferred_warehouse = min([w for w in self.wc.warehouses.list() if w.state == State.RUNNING], + key=lambda w: f"{'0' if w.enable_serverless_compute else '9'}{w.name}") + if preferred_warehouse is not None: + self.log.info(f"Using warehouse: {preferred_warehouse.name} ({preferred_warehouse.id})") + return preferred_warehouse.id + else: + self.log.warn(f"No suitable warehouse found") + + + def _find_all_dashboards_with_name(self, display_name: str): + dashboard_ids = [] + for d in self.wc.lakeview.list(): + if d.display_name == display_name: + dashboard_ids.append(d.dashboard_id) + self.log.info(f"Found existing dashboard with display name '{d.display_name}' ({d.dashboard_id})") + else: + self.log.debug(f"Ignoring dashboard with display name '{d.display_name}' != '{display_name}'") + return dashboard_ids + + + def _get_dashboard_id(self) -> Optional[str]: + if self.dashboard_id is not None: + return self.dashboard_id + + candidate_ids = self._find_all_dashboards_with_name(self.published_dashboard_name) + if len(candidate_ids) > 1: + raise ValueError(f"Multiple dashboard found with display name {self.published_dashboard_name}. Please specify an explicit `dashboard_id`.") + return None if len(candidate_ids) == 0 else candidate_ids[0] + + + def _process_dataset(self, dataset_elem: dict): + dataset_elem['catalog'] = self.default_dataset_catalog + dataset_elem['schema'] = self.default_dataset_schema + + + def publish(self): + """ + Publishes the dashboard + """ + with open(self.dashboard_template_path) as f: + dashboard_json = json.load(f) + + for ds in dashboard_json.get("datasets", []): + self._process_dataset(ds) + + real_dashboard_id = self._get_dashboard_id() + if real_dashboard_id is None: + self.log.info(f"Creating new dashboard with display name '{self.published_dashboard_name}'") + else: + self.log.info(f"Using existing dashboard with ID {real_dashboard_id}") + + d_json = { + "display_name": self.published_dashboard_name, + "serialized_dashboard": json.dumps(dashboard_json), + "warehouse_id": self.warehouse_id + } + + if real_dashboard_id is None: + d = self.wc.lakeview.create(dashboard=Dashboard.from_dict(d_json)) + self.log.info(f"Created dashboard '{d.display_name}' (ID={d.dashboard_id} ETAG={d.etag})") + real_dashboard_id = d.dashboard_id + else: + d_json["dashboard_id"] = real_dashboard_id + d = self.wc.lakeview.update(dashboard_id=real_dashboard_id, dashboard=Dashboard.from_dict(d_json)) + self.log.info(f"Updated dashboard '{d.display_name}' (ID={d.dashboard_id} ETAG={d.etag})") + + pd = self.wc.lakeview.publish(dashboard_id=real_dashboard_id, embed_credentials=True, warehouse_id=self.warehouse_id) + self.log.info(f"Published dashboard '{pd.display_name}' revision time {pd.revision_create_time}") + + diff --git a/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/common_ldp.py b/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/common_ldp.py new file mode 100644 index 0000000..151da64 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/common_ldp.py @@ -0,0 +1,1118 @@ +""" +Common observability functions to be used within SDP pipelines. +""" + +from collections import namedtuple +import logging +from typing import Callable, Dict, Iterable, List, Optional, Set + +import dlt +from databricks.sdk import WorkspaceClient +from databricks.sdk.errors.platform import ResourceDoesNotExist +from pyspark.sql import SparkSession, DataFrame + +from .common import parse_comma_separated_list +from .standard_tables import * + +def sanitize_string_for_dlt_name(s: str) -> str: + res = "" + for c in s: + if c == '.' or c == '-': + res += '_' + elif c != '`': + res += c + return res + + +class Constants: + """ + Shared names and other constants + """ + # Shared table names + created_pipeline_runs="created_pipeline_runs" + standard_pipeline_runs="standard_pipeline_runs" + + # Miscellaneous + sql_fields_def_extension_point = "-- fields def extension point" + where_clause_extension_point = "-- where clause extension point" + + +class Configuration: + """ + Base monitoring ETL pipeline configuration + """ + def __init__(self, conf: Dict[str, str]): + self.monitoring_catalog = self._required_string_param(conf, "monitoring_catalog") + self.monitoring_schema = self._required_string_param(conf, "monitoring_schema") + self.directly_monitored_pipeline_ids=conf.get("directly_monitored_pipeline_ids", "") + self.directly_monitored_pipeline_tags=conf.get("directly_monitored_pipeline_tags", "") + self.imported_event_log_tables = conf.get("imported_event_log_tables", "") + + # Pipeline tags index configuration + self.pipeline_tags_index_table_name = conf.get("pipeline_tags_index_table_name", "pipeline_tags_index") + self.pipeline_tags_index_enabled = conf.get("pipeline_tags_index_enabled", "true").lower() == "true" + self.pipeline_tags_index_max_age_hours = int(conf.get("pipeline_tags_index_max_age_hours", "24")) + self.pipeline_tags_index_api_fallback_enabled = conf.get("pipeline_tags_index_api_fallback_enabled", "true").lower() == "true" + + @staticmethod + def _required_string_param(conf: Dict[str, str], param_name: str): + val = conf.get(param_name) + if val is None or len(val.strip()) == 0: + raise ValueError(f"Missing required parameter '{param_name}'") + return val + + +# A helper class to capture metadata about monitored pipelines +PipelineInfo = namedtuple( + 'PipelineInfo', + field_names=[ + "pipeline_id", + "pipeline_name", + "pipeline_link", + "pipeline_type", + "default_catalog", + "default_schema", + "event_log_source", + "tags_map", + "tags_array" + ] +) + + +class MonitoringEtlPipeline: + """ + A helper class to keep track of monitored pipelines. + """ + + def __init__(self, conf: Configuration, spark: SparkSession): + self.conf = conf + self.spark = spark + self.monitored_pipeline_ids = [] + self.imported_event_log_tables = [] + # a dict from a pipeline id to a imported event log table for pipelines detected in these tables + self.other_pipeline_event_logs: Dict[str, str] = {} + # a dict from a monitored pipeline id to all metadata about this pipeline + self.pipeline_infos: Dict[str, PipelineInfo] = {} + # The set of all unique sources of event logs; this includes both tables with imported logs and also Delta event logs + self.event_log_sources: Set[str] = set() + self.wc = WorkspaceClient() + self.log = logging.getLogger("MonitoredPipelines") + self.event_log_source_views_mapping = {} + + # Automatically register pipelines in configuration + self.register_delta_event_logs_from_pipelines_comma_list(self.conf.directly_monitored_pipeline_ids) + self.register_delta_event_logs_from_pipelines_by_tags(self.conf.directly_monitored_pipeline_tags) + self.register_imported_logs_tables_from_comma_list(self.conf.imported_event_log_tables, spark) + + + def register_delta_event_logs_for_one_pipeline(self, pipeline_id: str): + """ + Registers a pipeline that is being monitored. This method will extract all necessary metadata. + """ + + self.log.info(f"Detecting configuration for pipeline {pipeline_id} ...") + try: + spec = self.wc.api_client.do("GET", f"/api/2.0/pipelines/{pipeline_id}").get('spec', {}) + except ResourceDoesNotExist: + self.log.warn(f"Skipping pipeline {pipeline_id} that no longer exists...") + return + + event_log_info = spec.get("event_log", {}) + if ('name' not in event_log_info) and (pipeline_id not in self.other_pipeline_event_logs): + raise Exception(f"""Pipeline {spec.get('name')} ({pipeline_id}) is not configured for Delta table event log and is not imported. + Either configure the event log to be written to a Delta table or imported it using the import_event_logs job: {spec}""") + + if spec.get('gateway_definition') is not None: + pipeline_type = 'gateway' + elif spec.get('ingestion_definition') is not None: + pipeline_type = 'ingestion' + else: + pipeline_type = 'etl' + + event_log_source = ( + f"`{event_log_info['catalog']}`.`{event_log_info['schema']}`.`{event_log_info['name']}`" if 'name' in event_log_info + else self.other_pipeline_event_logs[pipeline_id] + ) + + # Extract tags from pipeline spec + tags = spec.get('tags', {}) + # Create a map representation of tags + tags_map = tags if tags else None + # Create an array of "tag:value" strings for AI/BI dashboard filtering + tags_array = [f"{k}:{v}" for k, v in tags.items()] if tags else None + + self.pipeline_infos[pipeline_id] = PipelineInfo(pipeline_id=pipeline_id, + pipeline_name=spec['name'], + pipeline_link=f"{spec['name']}", + pipeline_type=pipeline_type, + default_catalog = spec['catalog'], + default_schema = spec.get('schema', spec.get('target')), + event_log_source=event_log_source, + tags_map=tags_map, + tags_array=tags_array) + self.event_log_sources.add(event_log_source) + self.log.info(f"Registered pipeline {spec.get('name')} ({pipeline_id}) ...") + + + def register_delta_event_logs_for_pipelines(self, pipeline_ids: Iterable[str]): + """ + Registers a collection of pipelines that are being monitored. This method will extract all necessary metadata. + """ + for pipeline_id in pipeline_ids: + self.register_delta_event_logs_for_one_pipeline(pipeline_id=pipeline_id) + + + def register_delta_event_logs_from_pipelines_comma_list(self, pipelines_comma_list: str): + """ + Registers a list of pipelines that are being monitored as a comma-separted list. This is primarily to be + used with spark configuration and notebook parameters + """ + self.register_delta_event_logs_for_pipelines(parse_comma_separated_list(pipelines_comma_list)) + + + def register_delta_event_logs_from_pipelines_by_tags(self, tags_str: str): + """ + Registers pipelines that match ANY of the specified tag:value pairs for monitoring. + :param tags_str: Comma-separated list of tag:value pairs (e.g., "env:prod,team:data") + """ + from .common import parse_tag_value_pairs, get_pipeline_ids_by_tags + + tag_groups = parse_tag_value_pairs(tags_str) + if not tag_groups: + self.log.info("No tags specified for pipeline filtering") + return + + self.log.info(f"Fetching pipelines matching tags: {tags_str}") + + # Construct fully qualified table name for the index + index_table_fqn = f"`{self.conf.monitoring_catalog}`.`{self.conf.monitoring_schema}`.`{self.conf.pipeline_tags_index_table_name}`" + + pipeline_ids = get_pipeline_ids_by_tags( + wc=self.wc, + tag_groups=tag_groups, + spark=self.spark, + index_table_fqn=index_table_fqn, + index_enabled=self.conf.pipeline_tags_index_enabled, + index_max_age_hours=self.conf.pipeline_tags_index_max_age_hours, + api_fallback_enabled=self.conf.pipeline_tags_index_api_fallback_enabled, + log=self.log + ) + + if not pipeline_ids: + self.log.warning(f"No pipelines found matching any of the tags: {tags_str}") + else: + self.log.info(f"Found {len(pipeline_ids)} pipeline(s) matching tags, registering for monitoring") + self.register_delta_event_logs_for_pipelines(pipeline_ids) + + def register_one_imported_logs_table(self, imported_logs_table: str, spark: SparkSession): + """ + Detects all pipelines in an imported logs table + """ + if len(imported_logs_table.split('.')) < 3: + # Create a fully qualified name if it is not already + imported_logs_table = ( + f"`{self.conf.monitoring_catalog}`.`{self.conf.monitoring_schema}`.`{imported_logs_table}`" if imported_logs_table[0] != '`' + else f"`{self.conf.monitoring_catalog}`.`{self.conf.monitoring_schema}`.{imported_logs_table}" + ) + + self.log.info(f"Detecting pipelines in imported logs table log {imported_logs_table} ...") + self.imported_event_log_tables.append(imported_logs_table) + other_pipeline_ids = [ r.pipeline_id for r in spark.sql(f"SELECT DISTINCT origin.pipeline_id FROM {imported_logs_table}").collect()] + for pid in other_pipeline_ids: + self.other_pipeline_event_logs[pid] = imported_logs_table + self.register_delta_event_logs_for_one_pipeline(pipeline_id=pid) + + + def register_base_tables_and_views(self, spark: SparkSession): + """ + Registers a set of standard views and tables + """ + self.register_monitored_pipelines(spark) + self.register_event_log_source_views(spark) + self.register_created_pipeline_runs(spark) + self.register_event_logs_bronze(spark) + self.register_monitored_tables(spark) + self.register_pipeline_run_status(spark) + self.register_events_errors(spark) + self.register_events_warnings(spark) + self.register_metric_pipeline_hourly_error_rate(spark) + self.register_pipeline_status(spark) + self.register_events_table_metrics(spark) + self.register_table_status_per_pipeline_run(spark) + self.register_table_status(spark) + self.register_table_expectation_checks(spark) + + def register_imported_logs_tables(self, imported_logs_tables: Iterable[str], spark: SparkSession): + """ + Detects all pipelines in a collection of imported logs tables + """ + for imported_logs_table in imported_logs_tables: + self.register_one_imported_logs_table(imported_logs_table, spark) + + + def register_imported_logs_tables_from_comma_list(self, imported_logs_tables_comma_list: str, spark: SparkSession): + """ + Detects all pipelines in a comma-separated of imported logs table + """ + self.register_imported_logs_tables(parse_comma_separated_list(imported_logs_tables_comma_list), spark) + + + def register_monitored_pipelines(self, spark: SparkSession): + @dlt.table(name=MONITORED_PIPELINES.name, + cluster_by=['pipeline_id'], + comment=MONITORED_PIPELINES.table_comment, + table_properties={ + "delta.enableRowTracking": "true" + }) + def monitored_pipelines(): + return spark.createDataFrame( + self.pipeline_infos.values(), + schema="pipeline_id STRING, pipeline_name STRING, pipeline_link STRING, pipeline_type STRING, default_catalog STRING, default_schema STRING, event_log_source STRING, tags_map MAP, tags_array ARRAY" + ) + + + def register_event_log_source_views(self, spark: SparkSession) -> Dict[str, str]: + """ + Generates a view for each event log table. We need to ensure that "skipChangeCommits" is set to true + so we don't break if modification or deletions are done in those tables. + + :return: A mapping from event logs source table to its corresponding view + """ + def create_event_log_source_view(event_log_source: str) -> str: + view_name = f"source_{sanitize_string_for_dlt_name(event_log_source)}" + print(f"Defining source view {view_name} for event log source {event_log_source}") + + @dlt.view(name=view_name) + def event_logs_source_view(): + return spark.readStream.option("skipChangeCommits", "true").table(event_log_source) + + return view_name + + self.event_log_source_views_mapping = { + event_log_source: create_event_log_source_view(event_log_source) + for event_log_source in self.event_log_sources} + + return self.event_log_source_views_mapping + + + def transfom_and_append_event_log_sources(self, + target: str, + flow_prefix: str, + append_def: Callable[[str], DataFrame]): + """ + Creates append flows per event log source into a target table or sink + + :param target: the name of the target table or sink + :param flow_prefix: the string to prepend to the name of each flow into the target + :param append_def: a function that defines the append flow; it takes the name of the event log + stream source as a parameter + """ + + def process_el_source(el_source: str): + flow_name = f"{flow_prefix}_{sanitize_string_for_dlt_name(el_source)}" + log_source = f"STREAM(`{self.event_log_source_views_mapping[el_source]}`)" + print(f"Defining event log flow {flow_name} from {log_source} into {target}") + + @dlt.append_flow(name=flow_name, target=target) + def el_append_flow(): + return append_def(log_source) + + for el_source in self.event_log_sources: + process_el_source(el_source) + + + def register_created_pipeline_runs(self, spark: SparkSession): + """ + Creates a table and a view of all basic metadata about all pipeline runs detected in event logs of monitored + pipelines. This allows to easily filter out runs that are not part of the normal data processing. + """ + + dlt.create_streaming_table(name=Constants.created_pipeline_runs, + cluster_by=['pipeline_id', 'pipeline_run_id'], + comment=""" + A table to keep track of created pipeline runs with some metadata about each one. + It is used filter out runs that are not part of the normal data processing. + """, + table_properties={ + "delta.enableRowTracking": "true" + }) + + # Definition for flows from event log sources into `created_pipeline_runs` + def append_to_created_pipeline_runs(event_log_source: str): + details_partial_schema="STRUCT>" + return spark.sql(f""" + SELECT pipeline_id, + pipeline_run_id, + create_time, + create_update_details.validate_only, + create_update_details.explore_only, + (create_update_details.explore_only + OR maintenance_id IS NOT NULL) AS is_internal_run -- used to filter out internal runs + FROM (SELECT origin.pipeline_id, + origin.update_id AS pipeline_run_id, + origin.maintenance_id, + `timestamp` AS create_time, + from_json(details, '{details_partial_schema}').create_update AS create_update_details + FROM {event_log_source} + WHERE event_type == 'create_update') + """) + + @dlt.view(name=Constants.standard_pipeline_runs) + def generate_standard_pipeline_runs(): + return spark.sql(f""" + SELECT pipeline_id, pipeline_run_id + FROM `{Constants.created_pipeline_runs}` + WHERE NOT is_internal_run + """) + + self.transfom_and_append_event_log_sources( + target=Constants.created_pipeline_runs, + flow_prefix='cpr', + append_def=append_to_created_pipeline_runs) + + + def _get_event_logs_bronze_sql(self, event_log_source: str): + """ + Base definition for append flows from the event log sources into `event_logs_bronze` table. Subclasses can override + this and replace {Constants.sql_fields_def_extension_point} with additional fiels they want to include + """ + return f""" + SELECT id, + seq_num, + pipeline_id, + pipeline_run_id, + ('' || pipeline_run_id || '') AS pipeline_run_link, + + coalesce(CASE WHEN els.table_name IS NULL + OR mp.default_catalog IS NULL + OR INSTR(els.table_name, '.') > 0 + THEN els.table_name + ELSE CONCAT(mp.default_catalog, '.', mp.default_schema, '.', els.table_name) + END, + CASE WHEN dataset_name IS NULL + OR mp.default_catalog IS NULL + OR INSTR(dataset_name, '.') > 0 + THEN dataset_name + ELSE CONCAT(mp.default_catalog, '.', mp.default_schema, '.', dataset_name) + END, + details:operation_progress.cdc_snapshot.table_name::string, + ft.table_name) AS table_name, + flow_name, + batch_id, + event_timestamp, + message, + level, + error_message, + regexp_extract(error_message, r'^\\[([a-zA-Z.:0-9_]+)\\]', 1) as error_code, + event_type, + error_full, + details{Constants.sql_fields_def_extension_point} + FROM (SELECT id, + sequence.data_plane_id.seq_no as seq_num, + origin.pipeline_id, + origin.pipeline_name, + origin.update_id as pipeline_run_id, + origin.table_name, + origin.dataset_name, + origin.flow_name, + origin.batch_id, + `timestamp` as event_timestamp, + message, + level, + error.exceptions[0].message as error_message, + (CASE WHEN error.exceptions IS NOT NULL THEN error ELSE NULL END) AS error_full, + event_type, + parse_json(details) as details -- TODO: Should we parse with a fixed schema + FROM {event_log_source}) AS els + JOIN `{Constants.standard_pipeline_runs}` USING (pipeline_id, pipeline_run_id) + LEFT JOIN flow_targets AS ft USING (pipeline_id, pipeline_run_id, flow_name) + LEFT JOIN {MONITORED_PIPELINES.name} AS mp USING (pipeline_id) + WHERE event_type in ('create_update', + 'update_progress', -- Pipeline update start and progress events + 'flow_definition', -- Flow initialization + 'dataset_definition', -- Table initialization + 'flow_progress', -- metric and data-quality related events, errors + 'operation_progress' -- Snapshot progress + ) + """ + + + def register_event_logs_bronze(self, spark: SparkSession): + """ + Registers tables and views for the bronze layer of the event logs that contains basic common event log + filters and transformations. This is the root source for most of observability tables. + """ + + def qualify_table_name_if_needed(table_name: str, default_catalog: str, default_schema: str) -> str: + """ + Event logs sometimes contain a fully qualified table name and sometimes just the base name. This + helper UDF uses the pipeline's default catalog and schema and would include those to unqualified + table names. + """ + if table_name is None or default_catalog is None or table_name.find('.') >= 0: + return table_name + return f"{default_catalog}.{default_schema}.{table_name}" + # Comment out due to ES-1633439 + # spark.udf.register("qualify_table_name_if_needed", qualify_table_name_if_needed) + + # Create a helper table to map flows to target table as the target table names are currently not included + # in the event log consistently + dlt.create_streaming_table( + name="flow_targets", + cluster_by=["pipeline_id", "pipeline_run_id", "flow_name"], + comment="""Keeps track of the target tables for each flow so we can attribute flow_progress events to + specific tables. + """, + table_properties={ + "delta.enableRowTracking": "true" + }) + + # The common transformation of event log sources going into the `flow_targets` table + def append_to_flow_targets(event_log_source: str): + partial_flow_definition_details_schema = """STRUCT>>, + schema_json: STRING, + spark_conf: ARRAY> > >""" + return spark.sql(f""" + SELECT pipeline_id, + pipeline_run_id, + flow_name, + (CASE WHEN details.output_dataset IS NULL + OR mp.default_catalog IS NULL + OR INSTR(details.output_dataset, '.') > 0 + THEN details.output_dataset + ELSE CONCAT(mp.default_catalog, '.', mp.default_schema, '.', details.output_dataset) + END) AS table_name, + details.schema, + details.schema_json, + details.spark_conf + FROM (SELECT origin.pipeline_id, + origin.pipeline_name, + origin.update_id as pipeline_run_id, + origin.flow_name, + from_json(details, '{partial_flow_definition_details_schema}').flow_definition as details + FROM {event_log_source} + WHERE event_type='flow_definition') AS fd + LEFT JOIN {MONITORED_PIPELINES.name} as mp USING (pipeline_id) + """) + + self.transfom_and_append_event_log_sources( + target="flow_targets", + flow_prefix='ft', + append_def=append_to_flow_targets) + + dlt.create_streaming_table(name=EVENT_LOGS_BRONZE.name, + cluster_by=['pipeline_id', 'pipeline_run_id', 'table_name'], + comment=EVENT_LOGS_BRONZE.table_comment, + table_properties={ + "delta.enableRowTracking": "true", + 'delta.feature.variantType-preview': 'supported' + }) + + # Definition of the transformations from the event logs sources into `event_logs_bronze` + def append_to_event_logs_bronze(event_log_source: str): + return spark.sql(self._get_event_logs_bronze_sql(event_log_source)) + + self.transfom_and_append_event_log_sources( + target=EVENT_LOGS_BRONZE.name, + flow_prefix="elb", + append_def=append_to_event_logs_bronze) + + def register_monitored_tables(self, spark: SparkSession): + @dlt.table( + name=MONITORED_TABLES.name, + comment=MONITORED_TABLES.table_comment, + table_properties={ + "delta.enableRowTracking": "true" + }) + def monitored_tables(): + return spark.sql(f""" + SELECT DISTINCT pipeline_id, table_name + FROM `{EVENT_LOGS_BRONZE.name}` + WHERE table_name is not null + """) + + def register_pipeline_run_status(self, spark: SparkSession): + """ + Register the flows and tables needed to maintain the latest status of runs of monitored pipelines. + """ + # We filter update_progress event from pipeline runs and use apply_changes() to maintain the latest status of each pipeline run + source_view_name = f"{PIPELINE_RUNS_STATUS.name}_source" + + @dlt.view(name=source_view_name) + def pipeline_runs_status_source(): + """ + Generates an apply_changes() stream for pipeline_updates_agg + """ + return spark.sql(f""" + SELECT *, + ('' || latest_state || '') AS latest_state_with_color + FROM (SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + latest_state, + (CASE WHEN latest_state = 'FAILED' THEN 'red' + WHEN latest_state = 'COMPLETED' THEN 'green' + WHEN latest_state = 'RUNNING' THEN 'blue' + WHEN latest_state = 'CANCELED' THEN 'gray' + ELSE 'black' + END) AS state_color, + (CASE WHEN latest_state = 'FAILED' THEN 100 + WHEN latest_state = 'CANCELED' THEN 11 + WHEN latest_state = 'COMPLETED' THEN 10 + WHEN latest_state = 'STOPPING' THEN 8 + WHEN latest_state = 'RUNNING' THEN 7 + WHEN latest_state = 'SETTING_UP_TABLES' THEN 6 + WHEN latest_state = 'RESETTING' THEN 5 + WHEN latest_state = 'INITIALIZING' THEN 4 + WHEN latest_state = 'WAITING_FOR_RESOURCES' THEN 3 + WHEN latest_state = 'QUEUED' THEN 2 + WHEN latest_state = 'CREATED' THEN 1 + ELSE 0 END) AS latest_state_level, + (CASE WHEN event_type = 'create_update' THEN event_timestamp END) AS create_time, + (CASE WHEN event_type = 'update_progress' AND latest_state = 'WAITING_FOR_RESOURCES' THEN event_timestamp END) AS queued_time, + (CASE WHEN event_type = 'update_progress' AND latest_state = 'INITIALIZING' THEN event_timestamp END) AS initialization_start_time, + (CASE WHEN event_type = 'update_progress' AND latest_state = 'RUNNING' THEN event_timestamp END) AS running_start_time, + (CASE WHEN event_type = 'update_progress' AND latest_state in ('COMPLETED', 'CANCELED', 'FAILED') THEN event_timestamp END) AS end_time, + (CASE WHEN event_type = 'update_progress' AND latest_state in ('COMPLETED', 'CANCELED', 'FAILED') THEN true END) AS is_complete, + latest_error_log_message, + latest_error_message, + latest_error_code, + latest_error_full, + event_timestamp AS updated_at, + seq_num + FROM (SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + event_timestamp, + details:update_progress.state::string AS latest_state, + (CASE WHEN error_full is not null or level='ERROR' THEN message END) AS latest_error_log_message, + error_message as latest_error_message, + error_code as latest_error_code, + error_full AS latest_error_full, + event_type, + seq_num + FROM STREAM(`{EVENT_LOGS_BRONZE.name}`)) + WHERE event_type == 'create_update' OR event_type == 'update_progress') + """) + + dlt.create_streaming_table(name=PIPELINE_RUNS_STATUS.name, + cluster_by=['pipeline_id', 'pipeline_run_id'], + comment=PIPELINE_RUNS_STATUS.table_comment, + table_properties={ + "delta.enableRowTracking": "true", + "delta.enableChangeDataFeed": "true" + }) + dlt.apply_changes( + source=source_view_name, + target=PIPELINE_RUNS_STATUS.name, + keys = ["pipeline_id", "pipeline_run_id"], + sequence_by = "seq_num", + except_column_list = ['seq_num'], + ignore_null_updates = True) + + def _get_events_errors_sql(self): + return f""" + SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + flow_name, + table_name, + event_timestamp, + message AS error_log_message, + error_message, + error_code, + error_full{Constants.sql_fields_def_extension_point} + FROM STREAM(`{EVENT_LOGS_BRONZE.name}`) + WHERE error_full is not null or level="ERROR" + """ + + def register_events_errors(self, spark: SparkSession): + @dlt.table(name=EVENTS_ERRORS.name, + cluster_by=["pipeline_id", "pipeline_run_id"], + comment=EVENTS_ERRORS.table_comment, + table_properties={ + "delta.enableRowTracking": "true" + }) + def generate_events_errors(): + return spark.sql(self._get_events_errors_sql()) + + def _get_events_warnings_sql(self): + return f""" + SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + flow_name, + table_name, + event_timestamp, + message AS warning_log_message{Constants.sql_fields_def_extension_point} + FROM STREAM(`{EVENT_LOGS_BRONZE.name}`) + WHERE level="WARN" + """ + + def register_events_warnings(self, spark: SparkSession): + @dlt.table(name=EVENTS_WARNINGS.name, + cluster_by=["pipeline_id", "pipeline_run_id"], + comment=EVENTS_WARNINGS.table_comment, + table_properties={ + "delta.enableRowTracking": "true" + }) + def generate_events_warnings(): + return spark.sql(self._get_events_warnings_sql()) + + def register_metric_pipeline_hourly_error_rate(self, spark: SparkSession): + @dlt.table(name=METRIC_PIPELINE_HOURLY_ERROR_RATE.name, + comment=METRIC_PIPELINE_HOURLY_ERROR_RATE.table_comment, + cluster_by=['pipeline_id'], + table_properties={ + "delta.enableRowTracking": "true" + }) + def generate_metric_pipeline_hourly_error_rate(): + return spark.sql(f""" + SELECT pipeline_id, + date_trunc('hour', event_timestamp) AS hour, + count(*) FILTER (WHERE level='ERROR' OR error_full IS NOT NULL) AS num_errors + FROM `{EVENT_LOGS_BRONZE.name}` + GROUP BY 1, 2 + """) + + def register_pipeline_status(self, spark: SparkSession): + pipeline_runs_status_fqname=f"{self.conf.monitoring_catalog}.{self.conf.monitoring_schema}.{PIPELINE_RUNS_STATUS.name}" + + @dlt.view(name=f"{PIPELINE_RUNS_STATUS.name}_cdf") + def pipeline_runs_status_cdf(): + return ( + spark.readStream + .option("readChangeFeed", "true") + .table(PIPELINE_RUNS_STATUS.name) + .filter("_change_type IN ('insert', 'update_postimage')") + ) + + dlt.create_streaming_table(name=PIPELINES_STATUS_SILVER.name, + cluster_by = ["pipeline_id"], + comment=PIPELINES_STATUS_SILVER.table_comment, + table_properties={ + "delta.enableRowTracking": "true" + }) + latest_runs_view_name = f"{PIPELINE_RUNS_STATUS.name}_latest" + @dlt.view(name=latest_runs_view_name) + def latest_pipeline_run_progress(): + return spark.sql(f""" + SELECT pipeline_id, + pipeline_run_id as latest_pipeline_run_id, + pipeline_run_link as latest_pipeline_run_link, + create_time as latest_pipeline_run_create_time, + end_time as latest_pipeline_run_end_time, + latest_state as latest_pipeline_run_state, + state_color as latest_pipeline_run_state_color, + latest_state_with_color as latest_pipeline_run_state_with_color, + latest_state_level as latest_pipeline_run_state_level, + is_complete as latest_pipeline_run_is_complete, + -- use an empty strings so that it overwrites any errors from previous runs + ifnull(latest_error_log_message, '') AS latest_error_log_message, + ifnull(latest_error_message, '') AS latest_error_message, + ifnull(latest_error_code, '') AS latest_error_code, + (CASE WHEN latest_error_log_message is not NULL AND latest_error_log_message != '' THEN updated_at END) as latest_error_time, + null as latest_successful_run_id, + null as latest_successful_run_link, + null as latest_successful_run_create_time, + null as latest_successful_run_end_time, + null as latest_failed_run_id, + null as latest_failed_run_link, + null as latest_failed_run_create_time, + null as latest_failed_run_end_time, + null as latest_failed_run_error_log_message, + null as latest_failed_run_error_message, + null as latest_failed_run_error_code, + updated_at + FROM STREAM(`{PIPELINE_RUNS_STATUS.name}_cdf`) + """) + dlt.create_auto_cdc_flow( + name=f"apply_{latest_runs_view_name}", + source=latest_runs_view_name, + target=PIPELINES_STATUS_SILVER.name, + keys=['pipeline_id'], + sequence_by='updated_at', + ignore_null_updates=True + ) + + successful_runs_view_name = f"{PIPELINE_RUNS_STATUS.name}_successful" + @dlt.view(name=successful_runs_view_name) + def latest_pipeline_successful_run(): + return spark.sql(f""" + SELECT pipeline_id, + null as latest_pipeline_run_id, + null as latest_pipeline_run_link, + null as latest_pipeline_run_create_time, + null as latest_pipeline_run_end_time, + null as latest_pipeline_run_state, + null as latest_pipeline_run_state_color, + null as latest_pipeline_run_state_with_color, + null as latest_pipeline_run_state_level, + null as latest_pipeline_run_is_complete, + null as latest_error_log_message, + null AS latest_error_message, + null AS latest_error_code, + null as latest_error_time, + pipeline_run_id as latest_successful_run_id, + pipeline_run_link as latest_successful_run_link, + create_time as latest_successful_run_create_time, + end_time as latest_successful_run_end_time, + null as latest_failed_run_id, + null as latest_failed_run_link, + null as latest_failed_run_create_time, + null as latest_failed_run_end_time, + null as latest_failed_run_error_log_message, + null as latest_failed_run_error_message, + null as latest_failed_run_error_code, + updated_at + FROM STREAM(`{PIPELINE_RUNS_STATUS.name}_cdf`) + WHERE latest_state == 'COMPLETED' + """) + + dlt.create_auto_cdc_flow( + name=f"apply_{successful_runs_view_name}", + source=successful_runs_view_name, + target=PIPELINES_STATUS_SILVER.name, + keys=['pipeline_id'], + sequence_by='updated_at', + ignore_null_updates=True + ) + + failed_runs_view_name = f"{PIPELINE_RUNS_STATUS.name}_failed" + @dlt.view(name=failed_runs_view_name) + def latest_pipeline_failed_run(): + return spark.sql(f""" + SELECT pipeline_id, + null as latest_pipeline_run_id, + null as latest_pipeline_run_link, + null as latest_pipeline_run_create_time, + null as latest_pipeline_run_end_time, + null as latest_pipeline_run_state, + null as latest_pipeline_run_state_color, + null as latest_pipeline_run_state_with_color, + null as latest_pipeline_run_state_level, + null as latest_pipeline_run_is_complete, + null as latest_error_log_message, + null AS latest_error_message, + null AS latest_error_code, + null as latest_error_time, + null as latest_successful_run_id, + null as latest_successful_run_link, + null as latest_successful_run_create_time, + null as latest_successful_run_end_time, + pipeline_run_id as latest_failed_run_id, + pipeline_run_link as latest_failed_run_link, + create_time as latest_failed_run_create_time, + end_time as latest_failed_run_end_time, + -- use empty strings so that it overwrites any errors from previous runs + ifnull(latest_error_log_message, '') as latest_failed_run_error_log_message, + ifnull(latest_error_message, '') as latest_failed_run_error_message, + ifnull(latest_error_code, '') as latest_failed_run_error_code, + updated_at + FROM STREAM(`{PIPELINE_RUNS_STATUS.name}_cdf`) + WHERE latest_state == 'FAILED' + """) + + dlt.create_auto_cdc_flow( + name=f"apply_{failed_runs_view_name}", + source=failed_runs_view_name, + target=PIPELINES_STATUS_SILVER.name, + keys=['pipeline_id'], + sequence_by='updated_at', + ignore_null_updates=True + ) + + @dlt.table(name=PIPELINES_STATUS.name, + comment=PIPELINES_STATUS.table_comment, + cluster_by=['pipeline_id'], + table_properties={ + "delta.enableRowTracking": "true" + }) + def pipeline_status(): + return spark.sql(f""" + SELECT latest.*, + ifnull(pe.num_errors, 0) latest_pipeline_run_num_errors, + ifnull(pw.num_warnings, 0) latest_pipeline_run_num_warnings + FROM `{PIPELINES_STATUS_SILVER.name}` as latest + LEFT JOIN ( + SELECT pipeline_id, pipeline_run_id, count(*) num_errors + FROM `{EVENTS_ERRORS.name}` + GROUP BY 1, 2 + ) as pe ON latest.pipeline_id = pe.pipeline_id and latest.latest_pipeline_run_id = pe.pipeline_run_id + LEFT JOIN ( + SELECT pipeline_id, pipeline_run_id, count(*) num_warnings + FROM `{EVENTS_WARNINGS.name}` + GROUP BY 1, 2 + ) as pw ON latest.pipeline_id = pw.pipeline_id and latest.latest_pipeline_run_id = pw.pipeline_run_id + """) + + def _get_events_table_metrics_sql(self): + return f""" + SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + flow_name, + table_name, + event_timestamp, + details:flow_progress.metrics.num_output_rows::bigint as num_output_rows, + details:flow_progress.metrics.backlog_bytes::bigint as backlog_bytes, + details:flow_progress.metrics.backlog_records::bigint as backlog_records, + details:flow_progress.metrics.backlog_files::bigint as backlog_files, + details:flow_progress.metrics.backlog_seconds::bigint as backlog_seconds, + details:flow_progress.metrics.executor_time_ms::bigint as executor_time_ms, + details:flow_progress.metrics.executor_cpu_time_ms::bigint as executor_cpu_time_ms, + details:flow_progress.metrics.num_upserted_rows::bigint as num_upserted_rows, + details:flow_progress.metrics.num_deleted_rows::bigint as num_deleted_rows, + details:flow_progress.metrics.num_output_bytes::bigint as num_output_bytes, + (CASE WHEN details:flow_progress.metrics.num_output_rows::bigint IS NULL + AND details:flow_progress.metrics.num_upserted_rows::bigint IS NULL + AND details:flow_progress.metrics.num_deleted_rows::bigint IS NULL THEN NULL + ELSE ifnull(details:flow_progress.metrics.num_output_rows::bigint, 0) + + ifnull(details:flow_progress.metrics.num_upserted_rows::bigint, 0) + + ifnull(details:flow_progress.metrics.num_deleted_rows::bigint, 0) END) AS num_written_rows, + details:flow_progress.streaming_metrics.event_time.min::timestamp AS min_event_time, + details:flow_progress.streaming_metrics.event_time.max::timestamp AS max_event_time, + details:flow_progress.data_quality.dropped_records::bigint as num_expectation_dropped_records{Constants.sql_fields_def_extension_point} + FROM STREAM(`{EVENT_LOGS_BRONZE.name}`) + WHERE table_name is not null + AND (details:flow_progress.metrics IS NOT NULL + OR details:flow_progress.streaming_metrics IS NOT NULL + OR details:flow_progress.data_quality IS NOT NULL) + """ + + def register_events_table_metrics(self, spark: SparkSession): + @dlt.table(name=EVENTS_TABLE_METRICS.name, + comment=EVENTS_TABLE_METRICS.table_comment, + cluster_by=['pipeline_id', 'pipeline_run_id', 'table_name'], + table_properties={ + "delta.enableRowTracking": "true" + }) + def generate_events_table_metrics(): + return spark.sql(self._get_events_table_metrics_sql()) + + def _get_table_run_processing_state_sql(self): + return f""" + SELECT *, + ('' || latest_state || '') as latest_state_with_color + FROM (SELECT *, + (CASE WHEN latest_state = 'FAILED' THEN 100 + WHEN latest_state = 'SKIPPED' THEN 50 + WHEN latest_state = 'STOPPED' THEN 13 + WHEN latest_state = 'EXCLUDED' THEN 12 + WHEN latest_state = 'IDLE' THEN 11 + WHEN latest_state = 'COMPLETED' THEN 10 + WHEN latest_state = 'RUNNING' THEN 5 + WHEN latest_state = 'PLANNING' THEN 3 + WHEN latest_state = 'STARTING' THEN 2 + WHEN latest_state = 'QUEUED' THEN 1 + ELSE 0 END) AS latest_state_level, + (CASE WHEN latest_state = 'FAILED' THEN 'red' + WHEN latest_state = 'SKIPPED' THEN 'red' + WHEN latest_state = 'STOPPED' THEN 'gray' + WHEN latest_state = 'EXCLUDED' THEN 'gray' + WHEN latest_state = 'IDLE' THEN 'green' + WHEN latest_state = 'COMPLETED' THEN 'green' + WHEN latest_state = 'RUNNING' THEN 'blue' + ELSE 'black' + END) AS latest_state_color + FROM (SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + table_name, + seq_num, + event_timestamp AS updated_at, + details:flow_progress.status::string AS latest_state, + (CASE WHEN event_type='dataset_definition' THEN details:dataset_definition.schema_json::string END) table_schema_json, + (CASE WHEN event_type='dataset_definition' THEN details:dataset_definition.schema::array> END) table_schema, + (CASE WHEN level='ERROR' OR error_full IS NOT NULL THEN event_timestamp END) AS latest_error_time, + (CASE WHEN level='ERROR' OR error_full IS NOT NULL THEN message END) AS latest_error_log_message, + error_message AS latest_error_message, + error_code AS latest_error_code, + error_full AS latest_error_full{Constants.sql_fields_def_extension_point} + FROM STREAM(`{EVENT_LOGS_BRONZE.name}`) + WHERE event_type in ('dataset_definition', 'flow_progress') + AND table_name IS NOT NULL + {Constants.where_clause_extension_point} + )) + """ + + def register_table_status_per_pipeline_run(self, spark: SparkSession): + dlt.create_streaming_table(name=TABLE_STATUS_PER_PIPELINE_RUN.name, + comment=TABLE_STATUS_PER_PIPELINE_RUN.table_comment, + cluster_by=['pipeline_id', 'pipeline_run_id', 'table_name'], + table_properties={ + "delta.enableRowTracking": "true", + "delta.enableChangeDataFeed": "true", + }) + + source_view_name=f"{TABLE_STATUS_PER_PIPELINE_RUN.name}_source" + @dlt.view(name=source_view_name) + def table_run_processing_state_source(): + return spark.sql(self._get_table_run_processing_state_sql()) + + dlt.create_auto_cdc_flow( + name=f"apply_{TABLE_STATUS_PER_PIPELINE_RUN.name}", + source=source_view_name, + target=TABLE_STATUS_PER_PIPELINE_RUN.name, + keys=['pipeline_id', 'pipeline_run_id', 'table_name'], + sequence_by='seq_num', + ignore_null_updates=True, + except_column_list=['seq_num']) + + + def register_table_status(self, spark: SparkSession): + # Use CDF because apply_changes can generate MERGE commits + table_status_per_pipeline_run_cdf = f"{TABLE_STATUS_PER_PIPELINE_RUN.name}_cdf" + + @dlt.view(name=table_status_per_pipeline_run_cdf) + def table_run_processing_state_cdf(): + return ( + spark.readStream + .option("readChangeFeed", "true") + .table(TABLE_STATUS_PER_PIPELINE_RUN.name) + .filter("_change_type IN ('insert', 'update_postimage')") + ) + + silver_table_name = f"{TABLE_STATUS.name}_silver" + dlt.create_streaming_table(name=silver_table_name, + comment="Capture information about the latest state, ingested data and errors for target tables", + cluster_by=['pipeline_id', 'table_name'], + table_properties={ + "delta.enableRowTracking": "true" + }) + + silver_latest_source_view_name = f"{silver_table_name}_latest_source" + @dlt.view(name=silver_latest_source_view_name) + def table_latest_run_processing_state_source(): + return spark.sql(f""" + SELECT pipeline_id, + table_name, + pipeline_run_id AS latest_pipeline_run_id, + pipeline_run_link AS latest_pipeline_run_link, + latest_state, + latest_state_level, + latest_state_color, + latest_state_with_color, + table_schema_json AS latest_table_schema_json, + table_schema AS latest_table_schema, + null AS latest_changes_time, + (CASE WHEN latest_error_time IS NOT NULL THEN pipeline_run_id END) AS latest_error_pipeline_run_id, + (CASE WHEN latest_error_time IS NOT NULL THEN pipeline_run_link END) AS latest_error_pipeline_run_link, + latest_error_time, + latest_error_log_message, + latest_error_message, + latest_error_code, + latest_error_full, + updated_at + FROM STREAM(`{table_status_per_pipeline_run_cdf}`) + """) + + dlt.create_auto_cdc_flow( + name=f"{silver_table_name}_apply_latest", + source=silver_latest_source_view_name, + target=silver_table_name, + keys=['pipeline_id', 'table_name'], + sequence_by='updated_at', + ignore_null_updates=True) + + silver_latest_changes_source_view_name = f"{silver_table_name}_latest_changes_source" + @dlt.view(name=silver_latest_changes_source_view_name) + def table_latest_run_processing_state_source(): + return spark.sql(f""" + SELECT pipeline_id, + table_name, + null AS latest_pipeline_run_id, + null AS latest_pipeline_run_link, + null AS latest_state, + null AS latest_state_level, + null AS latest_state_color, + null AS latest_state_with_color, + null AS latest_table_schema_json, + null AS latest_table_schema, + event_timestamp AS latest_changes_time, + null AS latest_error_pipeline_run_id, + null AS latest_error_pipeline_run_link, + null AS latest_error_time, + null AS latest_error_log_message, + null AS latest_error_message, + null AS latest_error_code, + null AS latest_error_full, + event_timestamp AS updated_at + FROM STREAM(`{EVENTS_TABLE_METRICS.name}`) + WHERE table_name IS NOT null AND num_written_rows > 0 + """) + + dlt.create_auto_cdc_flow( + name=f"{silver_table_name}_apply_latest_changes", + source=silver_latest_changes_source_view_name, + target=silver_table_name, + keys=['pipeline_id', 'table_name'], + sequence_by='updated_at', + ignore_null_updates=True) + + @dlt.table(name=TABLE_STATUS.name, + comment=TABLE_STATUS.table_comment, + cluster_by=['pipeline_id', 'table_name'], + table_properties={ + "delta.enableRowTracking": "true" + }) + def table_status(): + return spark.sql(f""" + SELECT s.*, + latest_pipeline_run_num_written_rows + FROM {silver_table_name} s + LEFT JOIN ( + SELECT pipeline_id, + pipeline_run_id, + table_name, + sum(ifnull(num_written_rows, 0)) AS latest_pipeline_run_num_written_rows + FROM {EVENTS_TABLE_METRICS.name} + GROUP BY 1, 2, 3 + ) AS etm + ON s.pipeline_id = etm.pipeline_id + AND s.latest_pipeline_run_id = etm.pipeline_run_id + AND s.table_name = etm.table_name + """) + + def register_table_expectation_checks(self, spark: SparkSession): + @dlt.table(name=TABLE_EVENTS_EXPECTATION_CHECKS.name, + comment=TABLE_EVENTS_EXPECTATION_CHECKS.table_comment, + cluster_by=['pipeline_id', 'pipeline_run_id', 'table_name', 'expectation_name'], + table_properties={ + "delta.enableRowTracking": "true" + }) + def table_expectation_checks(): + return spark.sql(f""" + SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + table_name, + flow_name, + event_timestamp, + expectation_name, + num_passed, + num_failed, + (100.0 * num_failed / (num_failed + num_passed)) AS failure_pct + FROM(SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + table_name, + flow_name, + event_timestamp, + expectation_metrics.name as expectation_name, + ifnull(expectation_metrics.passed_records, 0) as num_passed, + ifnull(expectation_metrics.failed_records, 0) as num_failed + FROM (SELECT pipeline_id, + pipeline_run_id, + pipeline_run_link, + table_name, + flow_name, + event_timestamp, + explode(details:flow_progress.data_quality.expectations::array>) AS expectation_metrics + FROM STREAM(`{EVENT_LOGS_BRONZE.name}`) + WHERE details:flow_progress.data_quality IS NOT NULL + )) + """) + pass diff --git a/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/standard_tables.py b/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/standard_tables.py new file mode 100644 index 0000000..7f8e4c1 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/lib/dbx_ingestion_monitoring/standard_tables.py @@ -0,0 +1,405 @@ +from enum import Enum +import logging +from typing import Dict, Optional + +from pyspark.sql import SparkSession, DataFrame + +class TableType(Enum): + STREAMING_TABLE = 1 + MATERIALIZED_VIEW = 2 + DELTA_TABLE = 3 + +class MonitoringTable: + """ + A helper class that encapsulates logic to generate a monitoring table. All tables are generated within the + default catalog and schema for the pipeline. + """ + def __init__(self, name: str, table_type: TableType, table_comment:str, column_comments: Optional[Dict[str, str]] = None): + """ + Constructor + :param name: the name of the table + :param type: 'st' or 'mv' + """ + self.name = name + self.table_type = table_type + self.table_comment = table_comment + self.column_comments = column_comments + self.log = logging.getLogger(f"dbx_ingestion_monitoring.MonitoringTable.{self.name}") + + def add_column_comments(self, monitoring_catalog: str, monitoring_schema: str, spark: SparkSession): + """ + Add comments to the columns of the table. This is a workaround because SDP currently does not support + adding those as part of the table definition. This method will use ALTER TABLE ALTER COLUMN ... COMMENT ... + to set the comments on the intersection of the keys of `self.column_comments` and the columns from the + table schema. + """ + if self.column_comments is None or len(self.column_comments) == 0: + return # Nothing to do + + + self.log.info(f"Adding column comments to table {self.name}") + fq_name = f"`{monitoring_catalog}`.`{monitoring_schema}`.`{self.name}`" + + if not spark.catalog.tableExists(fq_name): + self.log.warn(f"Table {fq_name} does not exist. Skipping column comments.") + return + + table_schema = spark.table(fq_name).schema + table_column_names = set(table_schema.fieldNames()) + + + if TableType.STREAMING_TABLE == self.table_type: + alter_type = 'STREAMING TABLE' + elif TableType.MATERIALIZED_VIEW == self.table_type: + alter_type = "MATERIALIZED VIEW" + elif TableType.DELTA_TABLE == self.table_type: + alter_type = "TABLE" + else: + raise AssertionError(f"Unexpected table_type: {self.table_type}") + + for column_name, column_comment in self.column_comments.items(): + if column_name in table_column_names: + column_comment_parts = column_comment.splitlines() + comment_sql=" ".join([p.replace("'","''") for p in column_comment_parts]) + sql = f"ALTER {alter_type} {fq_name} ALTER COLUMN `{column_name}` COMMENT '{comment_sql}'" + self.log.debug(f"Running {sql} ...") + spark.sql(sql) + else: + self.log.warn(f"Column {column_name} not found in table {self.name}") + +STANDARD_COLUMN_COMMENTS = { + "error_full": "Contains full details about the error that happened (if any)", + "error_message": "Short human-readable message describing the error (if any)", + "event_timestamp": "When this event occurred", + "flow_name": "The name of the flow (if any) that triggered this event", + "level": "The severity level of this event, one of: INFO, WARN, ERROR", + "message": "A human-readable message about the contents of this event", + "pipeline_id": "The unique identifier of the monitored pipeline", + "pipeline_link": "An HTML-formated link to the pipeline in the current workspace. Useful in dashboards.", + "pipeline_name": "The name of the monitored pipeline", + "pipeline_run_id": "The unique identifier of a specific pipeline run (update)", + "pipeline_run_link": "An HTML-formated link to the pipeline run in the current workspace. Useful in dashboards.", + "table_name": "Fully qualified replication target table name", + "latest_state": """The latest known state of the %s pipeline run. Can be one of: + QUEUED, CREATED, WAITING_FOR_RESOURCES, INITIALIZING, RESETTING, SETTING_UP_TABLES, + RUNNING, STOPPING, COMPLETED, FAILED, CANCELED""", + "state_color": "A helper color associated with the current state of the %s pipeline run", + "latest_state_with_color": "An HTML-formated string for %s pipeline run state. Useful in dashboards.", + "latest_state_level": """An integer that represents higher level of the %s pipeline run state progress + or severity if the pipeline run has finished.""", + "create_time": "Time when the %s pipeline run was created", + "end_time": "Time when the %s pipeline run finished its execution (entered COMPLETED, FAILED, CANCELED state)", + "is_complete": "If the %s pipeline run state is a final state -- COMPLETED, FAILED, CANCELED", + "latest_error_log_message": "Latest event log message in the %s pipeline run with ERROR level", + "latest_error_message": "Short description of the latest error (exception) message in the %s pipeline run", + "latest_error_code": "The error code (if any) of the latest error (exception) in the %s pipeline run", + "flow_type": "The logical type of the flow in the CDC Connector: 'cdc', 'snapshot', 'cdc_staging'", + "latest_table_state": """The latest state of a flow (writing to the target table). Can be: + QUEUED, STARTING, RUNNING, COMPLETED, FAILED, SKIPPED, STOPPED, IDLE, EXCLUDED""", + "latest_table_state_level": """An integer that represents higher level of the table state progress + or severity if the table processing has finished.""", + "latest_table_state_color": "A color associated with the latest state", + "latest_table_state_with_color": "An HTML-formated string with the latest state; useful in dashboards.", + "table_schema_json": "The schema of the target table in %s pipeline run as a JSON string", + "table_schema": "A list of schema information for each column in the target table in %s pipeline run", + "latest_table_error_time": "The time of the latest observed error in %s pipeline run for this table (if any)", + "latest_table_error_log_message": "The message in the event log for an error for this target table in %s pipeline run (if any)", + "latest_table_error_message": "The latest error message for the target table in %s pipeline run (if any)", + "latest_table_error_full": "Error details for the latest error for the target table in %s pipeline run (if any)", +} + + +MONITORED_PIPELINES = MonitoringTable( + name='monitored_pipelines', + table_type=TableType.MATERIALIZED_VIEW, + table_comment="Contains metadata about all monitored pipelines.", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "pipeline_name": STANDARD_COLUMN_COMMENTS['pipeline_name'], + "pipeline_link": STANDARD_COLUMN_COMMENTS['pipeline_link'], + "pipeline_type": """One of: 'gateway' for CDC Connector gateways, 'ingestion' for other ingestion pipelines, + 'etl' for all other pipelines""", + "default_catalog": "The default catalog for the pipeline", + "default_schema": "The default schema for the pipeline", + "event_log_source": """The fully qualified name to a Delta table containing the event log for this pipeline. + This could be the Delta table explicitly configured in the 'event_log' property of the pipeline spec or + a table where that log has been imported using the import_event_logs job.""", + "tags_map": "A map of tag keys to tag values for this pipeline. Useful for filtering and grouping pipelines by tags.", + "tags_array": """An array of 'tag:value' strings for this pipeline. Designed for easy filtering in AI/BI dashboards + where you can select a single value as a filtering expression.""" + }) + +MONITORED_TABLES = MonitoringTable( + name='monitored_tables', + table_type=TableType.MATERIALIZED_VIEW, + table_comment="Contains a list of all tables detected in monitored pipelines. Used in the observability dashboard for filtering by table.", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "table_name": STANDARD_COLUMN_COMMENTS['table_name'] + } +) + +EVENT_LOGS_BRONZE = MonitoringTable( + name='event_logs_bronze', + table_type=TableType.STREAMING_TABLE, + table_comment="Initial filtering and transformations of the input event logs that are shared by most observability tables", + column_comments={ + "id": "This event's unique identifier", + "seq_num": "Contains information about the position of this event in the event log", + "pipeline_id": "The unique identifier of the pipeline for which this event is", + "pipeline_run_id": STANDARD_COLUMN_COMMENTS["pipeline_run_id"], + "pipeline_run_link": STANDARD_COLUMN_COMMENTS["pipeline_run_link"], + "table_name": STANDARD_COLUMN_COMMENTS["table_name"], + "flow_name": STANDARD_COLUMN_COMMENTS["flow_name"], + "batch_id": "The micro-batch id that triggered this event (typically used in metric events)", + "event_timestamp": STANDARD_COLUMN_COMMENTS["event_timestamp"], + "message": STANDARD_COLUMN_COMMENTS["message"], + "level": STANDARD_COLUMN_COMMENTS["level"], + "error_message": STANDARD_COLUMN_COMMENTS["error_message"], + "error_full": STANDARD_COLUMN_COMMENTS["error_full"], + "event_type": """The type of the event. For example 'update_progress' captures state transitions for the current + pipeline run, 'flow_progress' captures state transition in the evaluation of a specific flow, etc. Look for + more information in `details`: + """, + "details": "Contains `event_type`-specific information in the field." + } +) + +PIPELINE_RUNS_STATUS = MonitoringTable( + name='pipeline_runs_status', + table_type=TableType.STREAMING_TABLE, + table_comment="Contains the latest status of monitored pipelines runs", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "pipeline_run_id": STANDARD_COLUMN_COMMENTS["pipeline_run_id"], + "pipeline_run_link": STANDARD_COLUMN_COMMENTS["pipeline_run_link"], + "latest_state": STANDARD_COLUMN_COMMENTS["latest_state"] % (''), + "state_color": STANDARD_COLUMN_COMMENTS["state_color"] % (''), + "latest_state_with_color": STANDARD_COLUMN_COMMENTS["latest_state_with_color"] % (''), + "latest_state_level": STANDARD_COLUMN_COMMENTS["latest_state_level"] % (''), + "create_time": STANDARD_COLUMN_COMMENTS["create_time"] % (''), + "queued_time": "Time when the pipeline run was queued for compute resources (entered WAITING_FOR_RESOURCES state)", + "initialization_start_time": "Time when the pipeline run started initialization (entered INITIALIZING state)", + "running_start_time": "Time when the pipeline starting its execution (entered RUNNING state)", + "end_time": STANDARD_COLUMN_COMMENTS["end_time"] % (''), + "is_complete": STANDARD_COLUMN_COMMENTS["is_complete"] % (''), + "latest_error_log_message": STANDARD_COLUMN_COMMENTS["latest_error_log_message"] % (''), + "latest_error_message": STANDARD_COLUMN_COMMENTS["latest_error_message"] % (''), + "latest_error_code": STANDARD_COLUMN_COMMENTS["latest_error_code"] % (''), + "latest_error_full": "Full stack trace of the latest error in the log", + "updated_at": "Timestamp of latest update (based on the event log timestamp) applied to this row" + } +) + +EVENTS_ERRORS = MonitoringTable( + name='events_errors', + table_type=TableType.STREAMING_TABLE, + table_comment="The stream of all errors in pipeline runs", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "pipeline_run_id": STANDARD_COLUMN_COMMENTS["pipeline_run_id"], + "pipeline_run_link": STANDARD_COLUMN_COMMENTS["pipeline_run_link"], + "table_name": STANDARD_COLUMN_COMMENTS["table_name"] + " affected by the error (if any)", + "flow_name": STANDARD_COLUMN_COMMENTS["flow_name"], + "event_timestamp": STANDARD_COLUMN_COMMENTS["event_timestamp"], + "error_log_message": STANDARD_COLUMN_COMMENTS["message"], + "error_message": STANDARD_COLUMN_COMMENTS["error_message"], + "error_full": STANDARD_COLUMN_COMMENTS["error_full"], + "flow_type": STANDARD_COLUMN_COMMENTS['flow_type'], + } +) + +EVENTS_WARNINGS = MonitoringTable( + name='events_warnings', + table_type=TableType.STREAMING_TABLE, + table_comment="The stream of all warnings in pipeline runs", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "pipeline_run_id": STANDARD_COLUMN_COMMENTS["pipeline_run_id"], + "pipeline_run_link": STANDARD_COLUMN_COMMENTS["pipeline_run_link"], + "table_name": STANDARD_COLUMN_COMMENTS["table_name"] + " affected by the warning (if any)", + "flow_name": STANDARD_COLUMN_COMMENTS["flow_name"], + "event_timestamp": STANDARD_COLUMN_COMMENTS["event_timestamp"], + "warning_log_message": STANDARD_COLUMN_COMMENTS["message"], + "flow_type": STANDARD_COLUMN_COMMENTS['flow_type'], + } +) + +METRIC_PIPELINE_HOURLY_ERROR_RATE = MonitoringTable( + name='metric_pipeline_error_rate', + table_type=TableType.MATERIALIZED_VIEW, + table_comment="Error rate per hour for all monitored pipelines", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "hour": "The hour for which the error rate is calculated", + "error_rate": "The number of errors per hour for the pipeline" + } +) + +PIPELINES_STATUS_SILVER = MonitoringTable( + name='pipelines_status_silver', + table_type=TableType.STREAMING_TABLE, + table_comment="Keeps track of the latest pipeline run, latest successful run and latest failed run for each pipeline", + column_comments={ + + } +) + +PIPELINES_STATUS = MonitoringTable( + name='pipelines_status', + table_type=TableType.MATERIALIZED_VIEW, + table_comment="Keeps track of the latests status for each monitored pipeline", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "latest_pipeline_run_id": f"Latest {STANDARD_COLUMN_COMMENTS['pipeline_run_id']}", + "latest_pipeline_run_link": f"Latest {STANDARD_COLUMN_COMMENTS['pipeline_run_link']}", + "latest_pipeline_run_state": STANDARD_COLUMN_COMMENTS["latest_state"] % ('latest'), + "latest_pipeline_run_state_color": STANDARD_COLUMN_COMMENTS["state_color"] % ('latest'), + "latest_pipeline_run_state_with_color": STANDARD_COLUMN_COMMENTS["latest_state_with_color"] % ('latest'), + "latest_pipeline_run_state_level": STANDARD_COLUMN_COMMENTS["latest_state_level"] % ('latest'), + "latest_pipeline_run_create_time": STANDARD_COLUMN_COMMENTS["create_time"] % ('latest'), + "latest_pipeline_run_end_time": STANDARD_COLUMN_COMMENTS["end_time"] % ('latest'), + "latest_pipeline_run_is_complete": STANDARD_COLUMN_COMMENTS["is_complete"] % ('latest'), + "latest_error_log_message": STANDARD_COLUMN_COMMENTS["latest_error_log_message"] % ('latest'), + "latest_error_message": STANDARD_COLUMN_COMMENTS["latest_error_message"] % ('latest'), + "latest_error_code": STANDARD_COLUMN_COMMENTS["latest_error_code"] % ('latest'), + "latest_error_time": "The time of the latest error (event)", + "latest_successful_run_id": f"Latest successful {STANDARD_COLUMN_COMMENTS['pipeline_run_id']}", + "latest_successful_run_link": f"Latest successful {STANDARD_COLUMN_COMMENTS['pipeline_run_link']}", + "latest_successful_run_create_time": STANDARD_COLUMN_COMMENTS["create_time"] % ('successful'), + "latest_successful_run_end_time": STANDARD_COLUMN_COMMENTS["end_time"] % ('successful'), + "latest_failed_run_id": f"Latest failed {STANDARD_COLUMN_COMMENTS['pipeline_run_id']}", + "latest_failed_run_link": f"Latest failed {STANDARD_COLUMN_COMMENTS['pipeline_run_link']}", + "latest_failed_run_create_time": STANDARD_COLUMN_COMMENTS["create_time"] % ('failed'), + "latest_failed_run_end_time": STANDARD_COLUMN_COMMENTS["end_time"] % ('failed'), + "latest_failed_run_error_log_message": STANDARD_COLUMN_COMMENTS["latest_error_log_message"] % ('failed'), + "latest_failed_run_error_message": STANDARD_COLUMN_COMMENTS["latest_error_message"] % ('failed'), + "latest_failed_run_error_code": STANDARD_COLUMN_COMMENTS["latest_error_code"] % ('failed'), + "updated_at": "Timestamp of latest update (based on the event log timestamp) applied to this row", + "latest_pipeline_run_num_errors": "The number of errors in the latest pipeline run", + "latest_pipeline_run_num_warnings": "The number of warnings in the latest pipeline run", + } +) + +EVENTS_TABLE_METRICS = MonitoringTable( + name='events_table_metrics', + table_type=TableType.STREAMING_TABLE, + table_comment="The stream of metric events to target tables", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "pipeline_run_id": STANDARD_COLUMN_COMMENTS["pipeline_run_id"], + "pipeline_run_link": STANDARD_COLUMN_COMMENTS["pipeline_run_link"], + "table_name": STANDARD_COLUMN_COMMENTS["table_name"], + "flow_name": STANDARD_COLUMN_COMMENTS["flow_name"], + "event_timestamp": STANDARD_COLUMN_COMMENTS["event_timestamp"], + "num_output_rows": "Number of output rows appended to the target table.", + "backlog_bytes": "Total backlog as bytes across all input sources in the flow.", + "backlog_records": "Total backlog records across all input sources in the flow.", + "backlog_files": "Total backlog files across all input sources in the flow.", + "backlog_seconds": "Maximum backlog seconds across all input sources in the flow.", + "executor_time_ms": "Sum of all task execution times in milliseconds of this flow over the reporting period.", + "executor_cpu_time_ms": "Sum of all task execution CPU times in milliseconds of this flow over the reporting period.", + "num_upserted_rows": "Number of output rows upserted into the dataset by an update of this flow.", + "num_deleted_rows": "Number of existing output rows deleted from the dataset by an update of this flow.", + "num_output_bytes": "Number of output bytes written by an update of this flow.", + "num_written_rows": "Total number of rows written to the target table -- combines num_output_rows, num_upserted_rows, num_deleted_rows", + "min_event_time": "The minimum event/commit time of a row processed in the specific micro-batch", + "max_event_time": "The maximum event/commit time of a row processed in the specific micro-batch", + "flow_type": STANDARD_COLUMN_COMMENTS['flow_type'], + "num_expectation_dropped_records": "The number of rows/records that were dropped due to failed DROP expectations.", + } +) + +TABLE_STATUS_PER_PIPELINE_RUN = MonitoringTable( + name='table_status_per_pipeline_run', + table_type=TableType.STREAMING_TABLE, + table_comment="Keeps track of the progress of processing a specific target table in pipeline runs", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "pipeline_run_id": STANDARD_COLUMN_COMMENTS["pipeline_run_id"], + "pipeline_run_link": STANDARD_COLUMN_COMMENTS["pipeline_run_link"], + "table_name": STANDARD_COLUMN_COMMENTS["table_name"], + "updated_at": "Timestamp of latest update (based on the event log timestamp) applied to this row", + "latest_state": STANDARD_COLUMN_COMMENTS["latest_table_state"], + "table_schema_json": STANDARD_COLUMN_COMMENTS["table_schema_json"] % ('this'), + "table_schema": STANDARD_COLUMN_COMMENTS["table_schema"] % ('this'), + "latest_error_time": STANDARD_COLUMN_COMMENTS["latest_table_error_time"] % ('this'), + "latest_error_log_message": STANDARD_COLUMN_COMMENTS["latest_table_error_log_message"] % ('this'), + "latest_error_message": STANDARD_COLUMN_COMMENTS["latest_table_error_message"] % ('this'), + "latest_error_full": STANDARD_COLUMN_COMMENTS["latest_table_error_full"] % ('this'), + "flow_type": STANDARD_COLUMN_COMMENTS['flow_type'], + "latest_state_level": STANDARD_COLUMN_COMMENTS['latest_table_state_level'], + "latest_state_color": STANDARD_COLUMN_COMMENTS['latest_table_state_color'], + "latest_state_with_color": STANDARD_COLUMN_COMMENTS['latest_table_state_with_color'], + } +) + +TABLE_STATUS = MonitoringTable( + name='table_status', + table_type=TableType.MATERIALIZED_VIEW, + table_comment="Keeps track of the latest progress of processing a specific target table", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "table_name": STANDARD_COLUMN_COMMENTS["table_name"], + "latest_pipeline_run_id": f"Latest {STANDARD_COLUMN_COMMENTS['pipeline_run_id']}", + "latest_pipeline_run_link": f"Latest {STANDARD_COLUMN_COMMENTS['pipeline_run_link']}", + "latest_state": STANDARD_COLUMN_COMMENTS["latest_table_state"], + "latest_state_level": STANDARD_COLUMN_COMMENTS['latest_table_state_level'], + "latest_state_color": STANDARD_COLUMN_COMMENTS['latest_table_state_color'], + "latest_state_with_color": STANDARD_COLUMN_COMMENTS['latest_table_state_with_color'], + "latest_table_schema_json": STANDARD_COLUMN_COMMENTS["table_schema_json"] % ('the latest'), + "latest_table_schema": STANDARD_COLUMN_COMMENTS["table_schema"] % ('the latest'), + "latest_cdc_changes_time": "The latest time when the CDC changes were applied to the target table", + "latest_snapshot_changes_time": "The latest time when the snapshot changes were applied to the target table", + "latest_error_pipeline_run_id": "The pipeline run id with the latest error for the target table", + "latest_error_pipeline_run_link": """An HTML-formatted link for the pipeline run with the latest error + for the target table; useful in dashboards""", + "latest_error_time": STANDARD_COLUMN_COMMENTS["latest_table_error_time"] % ('a'), + "latest_error_log_message": STANDARD_COLUMN_COMMENTS["latest_table_error_log_message"] % ('a'), + "latest_error_message": STANDARD_COLUMN_COMMENTS["latest_table_error_message"] % ('a'), + "latest_error_full": STANDARD_COLUMN_COMMENTS["latest_table_error_full"] % ('a'), + "latest_error_flow_type": "The flow type ('cdc', 'snapshot') where the latest error occurred for this target table", + } +) + +TABLE_EVENTS_EXPECTATION_CHECKS = MonitoringTable( + name='table_events_expectation_checks', + table_type=TableType.STREAMING_TABLE, + table_comment="Keeps track of the results of expectation checks for each pipeline run", + column_comments={ + "pipeline_id": STANDARD_COLUMN_COMMENTS['pipeline_id'], + "pipeline_run_id": STANDARD_COLUMN_COMMENTS['pipeline_run_id'], + "pipeline_run_link": STANDARD_COLUMN_COMMENTS['pipeline_run_link'], + "table_name": STANDARD_COLUMN_COMMENTS["table_name"], + "flow_name": STANDARD_COLUMN_COMMENTS["flow_name"], + "event_timestamp": STANDARD_COLUMN_COMMENTS["event_timestamp"], + "expectation_name": "The name of the expectation", + "num_passed": "The number of rows/records that passed the expectation check", + "num_failed": "The number of rows/records that failed the expectation check", + "failure_pct": "The percentage of rows/records that failed the expectation check", + } +) + +PIPELINE_TAGS_INDEX = MonitoringTable( + name='pipeline_tags_index', + table_type=TableType.DELTA_TABLE, + table_comment="""Inverted index mapping pipeline tags to pipeline IDs for efficient tag-based pipeline discovery. + Built and maintained by the 'Build pipeline tags index' job. Used to optimize performance when discovering + pipelines by tags instead of querying the Databricks API for every pipeline.""", + column_comments={ + "tag_key": "The tag key (e.g., 'env', 'team', 'critical')", + "tag_value": "The tag value (e.g., 'prod', 'data', 'true')", + "pipeline_ids": """Array of pipeline IDs that have this tag:value pair. Used for efficient lookup when + discovering pipelines by tags without expensive API calls.""", + "index_build_time": "Timestamp when this index was last built. Used to determine if the index is stale." + } +) + + +def set_all_table_column_comments(monitoring_catalog: str, monitoring_schema: str, spark: SparkSession): + for st in [MONITORED_PIPELINES, MONITORED_TABLES, EVENT_LOGS_BRONZE, PIPELINE_RUNS_STATUS, EVENTS_ERRORS, + EVENTS_WARNINGS, METRIC_PIPELINE_HOURLY_ERROR_RATE, PIPELINES_STATUS_SILVER, PIPELINES_STATUS, + EVENTS_TABLE_METRICS, TABLE_STATUS_PER_PIPELINE_RUN, TABLE_STATUS, + TABLE_EVENTS_EXPECTATION_CHECKS, PIPELINE_TAGS_INDEX]: + st.add_column_comments(monitoring_catalog, monitoring_schema, spark) + diff --git a/contrib/dbx_ingestion_monitoring/resources/build_pipeline_tags_index.job.yml b/contrib/dbx_ingestion_monitoring/resources/build_pipeline_tags_index.job.yml new file mode 100644 index 0000000..f29c4e1 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/resources/build_pipeline_tags_index.job.yml @@ -0,0 +1,24 @@ +# This job builds an inverted index mapping pipeline tags to pipeline IDs. +# The index is stored in a Delta table and used by import_event_logs and monitoring ETL pipelines +# to efficiently discover pipelines by tags without having to query the Databricks API for every pipeline. +# +# The job should be scheduled to run periodically (e.g., daily) to keep the index up-to-date. + +resources: + jobs: + build_pipeline_tags_index: + name: Build ${var.dab_type} pipeline tags index + schedule: + pause_status: ${var.pipeline_tags_index_schedule_state} + quartz_cron_expression: ${var.pipeline_tags_index_cron_schedule} + timezone_id: UTC + email_notifications: + on_failure: ${var.notification_emails} + tasks: + - task_key: build_index + notebook_task: + notebook_path: ../jobs/build_pipeline_tags_index.ipynb + base_parameters: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + pipeline_tags_index_table_name: ${var.pipeline_tags_index_table_name} diff --git a/contrib/dbx_ingestion_monitoring/resources/import_event_logs.job.yml b/contrib/dbx_ingestion_monitoring/resources/import_event_logs.job.yml new file mode 100644 index 0000000..7326714 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/resources/import_event_logs.job.yml @@ -0,0 +1,48 @@ +# This job will import event logs from pipelines which are not configured to store those logs +# in Delta tables. This is an expensive operation and it is recommended that you make that +# configuration job instead. +# +# This job uses `event_log()` TVF which requires the reader to the owner of the pipeline. +# If that is not the case, you'll need to enable the `run_as` setting. If the pipelines are +# owned by different principals, you'll need to create a different instance for this job +# per owner principal. +# +# Note that you need to add the target table name to the `imported_event_log_tables` variable for +# it to be loaded by the monitoring ETL pipeline. +resources: + jobs: + import_event_logs: + name: Import ${var.dab_type} event logs + # Set this to the owner principal of all imported pipelines + # run_as: + # service_principal_name: + # user_name: + schedule: + pause_status: ${var.import_event_log_schedule_state} + quartz_cron_expression: ${var.import_event_log_cron_schedule} + timezone_id: UTC + email_notifications: + on_failure: ${var.notification_emails} + tasks: + - task_key: init_target_table + notebook_task: + notebook_path: ../jobs/create_imported_event_logs_target_table.ipynb + base_parameters: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + imported_event_logs_table_name: ${var.imported_event_logs_table_name} + - task_key: merge_logs + depends_on: + - task_key: init_target_table + notebook_task: + notebook_path: ../jobs/import_event_logs.ipynb + base_parameters: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} + imported_pipeline_ids: ${var.imported_pipeline_ids} + imported_pipeline_tags: ${var.imported_pipeline_tags} + imported_event_logs_table_name: ${var.imported_event_logs_table_name} + pipeline_tags_index_table_name: ${var.pipeline_tags_index_table_name} + pipeline_tags_index_enabled: ${var.pipeline_tags_index_enabled} + pipeline_tags_index_max_age_hours: ${var.pipeline_tags_index_max_age_hours} + pipeline_tags_index_api_fallback_enabled: ${var.pipeline_tags_index_api_fallback_enabled} \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/resources/monitoring_schema.schema.yml b/contrib/dbx_ingestion_monitoring/resources/monitoring_schema.schema.yml new file mode 100644 index 0000000..36faf7d --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/resources/monitoring_schema.schema.yml @@ -0,0 +1,6 @@ +resources: + schemas: + monitoring_schema: + name: ${var.monitoring_schema} + catalog_name: ${var.monitoring_catalog} + comment: A schema used for aggregation of monitoring information for CDC Connector pipelines \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/resources/post_deploy.job.yml b/contrib/dbx_ingestion_monitoring/resources/post_deploy.job.yml new file mode 100644 index 0000000..f60112b --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/resources/post_deploy.job.yml @@ -0,0 +1,31 @@ +# A helper job to: +# +# - Publish an AI/BI dashboard from a template. See jobs/publish_dashboard.ipynb or more info. +# - Update the column comments for monitoring tables. See jobs/update_monitoring_tables_meta.ipynb for more info. +# +# This job should be run after the first successful execution of the monitoring ETL pipeline +# and after updates to the logic in the monitoring ETL pipeline. + +resources: + jobs: + post_deploy: + name: Post-deploy actions for ${var.dab_type} monitoring DAB + email_notifications: + on_failure: ${var.notification_emails} + tasks: + - task_key: publish_main_dashboard + notebook_task: + notebook_path: ../jobs/publish_dashboard.ipynb + base_parameters: + dashboard_template_path: ${var.main_dashboard_template_path} + dashboard_id: ${var.main_dashboard_id} + published_dashboard_name: ${var.main_dashboard_name} + default_dataset_catalog: ${var.monitoring_catalog} + default_dataset_schema: ${resources.schemas.monitoring_schema.name} + warehouse_id: ${var.warehouse_id} + - task_key: update_table_metadata + notebook_task: + notebook_path: ../jobs/update_monitoring_tables_meta.ipynb + base_parameters: + monitoring_catalog: ${var.monitoring_catalog} + monitoring_schema: ${resources.schemas.monitoring_schema.name} diff --git a/contrib/dbx_ingestion_monitoring/scripts/azure_setup.sh b/contrib/dbx_ingestion_monitoring/scripts/azure_setup.sh new file mode 100644 index 0000000..ae0c9c3 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/scripts/azure_setup.sh @@ -0,0 +1,232 @@ +#!/bin/bash + +################################################################################ +# Azure Monitor DCE and DCR Setup Script +# +# This script creates Data Collection Endpoint (DCE) and Data Collection Rule (DCR) +# for sending Databricks telemetry to Azure Monitor. +# +# Resources created: +# - Data Collection Endpoint (DCE) +# - Custom tables (DatabricksMetrics_CL, DatabricksLogs_CL, DatabricksEvents_CL) +# - Data Collection Rule (DCR) with three streams +# +# Usage: +# ./azure_setup.sh --resource-group --location --workspace-name +# +# Example: +# ./azure_setup.sh \ +# --resource-group databricks-monitoring-rg \ +# --location "East US" \ +# --workspace-name databricks-monitoring-workspace +# +# Note: The script will create data collection rule "databricks-monitoring-dcr" and +# a public data collection endpoint "databricks-monitoring-dce". To override these values change the +# global variables $DCE_NAME and $DCR_NAME. +# +# Output: +# - Azure Host Name (DCE hostname) +# - Azure DCR Immutable ID +################################################################################ + +set -e # Exit on error +set -o pipefail # Exit on pipe failure + +# DCE and DCR names - modify these if needed +DCE_NAME="databricks-monitoring-dce" +DCR_NAME="databricks-monitoring-dcr" + +# Function to show usage +usage() { + cat << EOF +Usage: $0 --resource-group --location --workspace-name + +Required arguments: + --resource-group Name of the existing Azure resource group + --location Azure region (e.g., 'East US', 'West Europe') + --workspace-name Name of the Log Analytics workspace + +Optional arguments: + --help Show this help message + +Example: + $0 --resource-group databricks-monitoring-rg \\ + --location "East US" \\ + --workspace-name databricks-monitoring-workspace + +Note: DCE and DCR names are set at the top of this script. + Current values: DCE_NAME=$DCE_NAME, DCR_NAME=$DCR_NAME + +For more information, see: README-third-party-monitoring.md +EOF + exit 1 +} + +# Parse command line arguments +while [[ $# -gt 0 ]]; do + case $1 in + --resource-group) + RESOURCE_GROUP="$2" + shift 2 + ;; + --location) + LOCATION="$2" + shift 2 + ;; + --workspace-name) + WORKSPACE_NAME="$2" + shift 2 + ;; + --help) + usage + ;; + *) + echo "ERROR: Unknown argument: $1" >&2 + usage + ;; + esac +done + +# Validate required arguments +if [[ -z "$RESOURCE_GROUP" || -z "$LOCATION" || -z "$WORKSPACE_NAME" ]]; then + echo "ERROR: Missing required arguments" >&2 + usage +fi + +echo "Starting Azure Monitor DCE and DCR setup..." +echo "Resource Group: $RESOURCE_GROUP" +echo "Location: $LOCATION" +echo "Workspace Name: $WORKSPACE_NAME" +echo "DCE Name: $DCE_NAME" +echo "DCR Name: $DCR_NAME" +echo "" + +# Step 1: Create Data Collection Endpoint +echo "[1/2] Creating Data Collection Endpoint..." +az monitor data-collection endpoint create \ + --resource-group "$RESOURCE_GROUP" \ + --name "$DCE_NAME" \ + --location "$LOCATION" \ + --public-network-access "Enabled" + +DCE_ENDPOINT=$(az monitor data-collection endpoint show \ + --resource-group "$RESOURCE_GROUP" \ + --name "$DCE_NAME" \ + --query "logsIngestion.endpoint" -o tsv) + +HOST_NAME="${DCE_ENDPOINT#https://}" + +# Step 2: Create Custom Tables and Data Collection Rule +echo "[2/2] Creating custom tables and data collection rule..." + +# Define table schemas in column_name=column_type format +declare -A TABLE_SCHEMAS +TABLE_SCHEMAS["DatabricksMetrics_CL"]="TimeGenerated=datetime metric_name=string metric_value=real timestamp=long tags=dynamic additional_attributes=dynamic" +TABLE_SCHEMAS["DatabricksLogs_CL"]="TimeGenerated=datetime message=string status=string timestamp=long tags=dynamic additional_attributes=dynamic" +TABLE_SCHEMAS["DatabricksEvents_CL"]="TimeGenerated=datetime message=string status=string timestamp=long tags=dynamic additional_attributes=dynamic" + +for TABLE_NAME in "${!TABLE_SCHEMAS[@]}"; do + az monitor log-analytics workspace table create \ + --resource-group "$RESOURCE_GROUP" \ + --workspace-name "$WORKSPACE_NAME" \ + --name "$TABLE_NAME" \ + --columns ${TABLE_SCHEMAS[$TABLE_NAME]} +done + +# Create Data Collection Rule + +SUBSCRIPTION_ID=$(az account show --query "id" -o tsv) +DCE_ID="/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.Insights/dataCollectionEndpoints/$DCE_NAME" +WORKSPACE_RESOURCE_ID="/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.OperationalInsights/workspaces/$WORKSPACE_NAME" + +# Stream declarations +STREAM_DECLARATIONS='{ + "Custom-DatabricksMetrics_CL": { + "columns": [ + {"name": "TimeGenerated", "type": "datetime"}, + {"name": "metric_name", "type": "string"}, + {"name": "metric_value", "type": "real"}, + {"name": "timestamp", "type": "long"}, + {"name": "tags", "type": "dynamic"}, + {"name": "additional_attributes", "type": "dynamic"} + ] + }, + "Custom-DatabricksLogs_CL": { + "columns": [ + {"name": "TimeGenerated", "type": "datetime"}, + {"name": "message", "type": "string"}, + {"name": "status", "type": "string"}, + {"name": "timestamp", "type": "long"}, + {"name": "tags", "type": "dynamic"}, + {"name": "additional_attributes", "type": "dynamic"} + ] + }, + "Custom-DatabricksEvents_CL": { + "columns": [ + {"name": "TimeGenerated", "type": "datetime"}, + {"name": "message", "type": "string"}, + {"name": "status", "type": "string"}, + {"name": "timestamp", "type": "long"}, + {"name": "tags", "type": "dynamic"}, + {"name": "additional_attributes", "type": "dynamic"} + ] + } +}' + +# Data flows +DATA_FLOWS='[ + { + "streams": ["Custom-DatabricksMetrics_CL"], + "destinations": ["databricks-workspace"], + "transformKql": "source", + "outputStream": "Custom-DatabricksMetrics_CL" + }, + { + "streams": ["Custom-DatabricksLogs_CL"], + "destinations": ["databricks-workspace"], + "transformKql": "source", + "outputStream": "Custom-DatabricksLogs_CL" + }, + { + "streams": ["Custom-DatabricksEvents_CL"], + "destinations": ["databricks-workspace"], + "transformKql": "source", + "outputStream": "Custom-DatabricksEvents_CL" + } +]' + +# Destinations +DESTINATIONS=$(cat <, + "last_load_timestamp": + } + """ + payload = { + "grant_type": "client_credentials", + "client_id": config["client_id"], + "client_secret": config["client_secret"], + "scope": "https://monitor.azure.com//.default" + } + + now = int(datetime.now(timezone.utc).timestamp()) + token_url = config['access_token_url'] + + try: + response = requests.post(token_url, data=payload, timeout=config['request_timeout_sec']) + response.raise_for_status() + + token_json = response.json() + if "access_token" not in token_json: + raise RuntimeError(f"Token response missing 'access_token': {token_json}") + + return { + "access_token": token_json["access_token"], + "last_load_timestamp": now + } + + except requests.RequestException as e: + error_message = ( + f"Failed to fetch Azure access token.\n" + f"Request URL: {token_url}\n" + f"Request payload: {payload}\n" + f"Response status: {getattr(e.response, 'status_code', 'N/A')}\n" + f"Response body: {getattr(e.response, 'text', 'N/A')}\n" + f"Original exception: {str(e)}" + ) + raise RuntimeError(error_message) from e + + + +def initialize_global_config(spark_conf): + """Initialize global configuration from Spark configuration.""" + global _global_config, _log_converter, _events_converter, _metrics_converter + + _global_config = getThirdPartySinkConfigFromSparkConfig(spark_conf) + _log_converter = AzureMonitorLogsConverter() + _events_converter = AzureMonitorEventsConverter() + _metrics_converter = AzureMonitorMetricsConverter() + +def getParam(spark_conf, key: str, default=None): + value = spark_conf.get(key, default) + if value == "" or value is None: + return None + return value + +def getThirdPartySinkConfigFromSparkConfig(spark_conf): + """ + Extract and merge configuration from Spark configuration and secret scope. + + This function extracts configuration variables from Spark configuration and merges + them with key-value pairs from a secret scope (if provided) to build common_params. + Secret store values take precedence over Spark configuration values when both exist. + + Args: + spark_conf: Spark configuration object containing required parameters + + Returns: + dict: Merged configuration parameters with secrets taking precedence + + The function looks for a 'secrets_scope' parameter in Spark config. If found, + it will retrieve all secrets from that scope and merge them with the base + configuration, giving preference to secret values. + """ + destination = getParam(spark_conf, "destination") + if destination is None: + raise ValueError("Destination must be provided for third party sinks.") + + common_params = { + "destination": destination, + "num_rows_per_batch": int(spark_conf.get("num_rows_per_batch", "100")), + "max_retry_duration_sec": int(spark_conf.get("max_retry_duration_sec", "300")), + "request_timeout_sec": int(spark_conf.get("request_timeout_sec", "30")), + "max_access_token_staleness": int(spark_conf.get("azure_max_access_token_staleness", "3300")), + "client_id": getParam(spark_conf, "azure_client_id"), + "client_secret": getParam(spark_conf, "azure_client_secret"), + "tenant_id": getParam(spark_conf, "azure_tenant_id"), + "host_name": getParam(spark_conf, "host_name"), + "dcr_immutable_id": getParam(spark_conf, "azure_dcr_immutable_id") + } + + scope = getParam(spark_conf, "secrets_scope") + if scope is not None: + secrets = { + s.key: dbutils.secrets.get(scope=scope, key=s.key) + for s in dbutils.secrets.list(scope) + } + common_params.update(secrets) + + # Get endpoints (allow override) + metrics_endpoint = getParam(spark_conf, "endpoints.metrics") + logs_endpoint = getParam(spark_conf, "endpoints.logs") + events_endpoint = getParam(spark_conf, "endpoints.events") + authorization_endpoint = getParam(spark_conf, "azure_authorization_endpoint") + + # Auto-generate authorization endpoint if not provided + if authorization_endpoint is None: + if common_params['tenant_id'] is None: + raise ValueError( + "Either 'azure_tenant_id' must be provided to auto-generate authorization endpoint, " + "or 'azure_authorization_endpoint' must be explicitly configured." + ) + authorization_endpoint = f"https://login.microsoftonline.com/{common_params['tenant_id']}/oauth2/v2.0/token" + + common_params["access_token_url"] = authorization_endpoint + + # Auto-generate data ingestion endpoints if not provided + if not all([metrics_endpoint, logs_endpoint, events_endpoint]): + if common_params['host_name'] is None: + raise ValueError( + "Either 'host_name' must be provided to auto-generate DCE endpoint, " + "or all three endpoints (endpoints.metrics, endpoints.logs, endpoints.events) " + "must be explicitly configured." + ) + dce_endpoint = f"https://{common_params['host_name']}" + + if common_params['dcr_immutable_id'] is None: + raise ValueError( + "Either 'dcr_immutable_id' must be provided to auto-generate DCE endpoint, " + "or all three endpoints (endpoints.metrics, endpoints.logs, endpoints.events) " + "must be explicitly configured." + ) + + metrics_endpoint = f"{dce_endpoint}/dataCollectionRules/{common_params['dcr_immutable_id']}/streams/Custom-DatabricksMetrics_CL?api-version=2023-01-01" + logs_endpoint = f"{dce_endpoint}/dataCollectionRules/{common_params['dcr_immutable_id']}/streams/Custom-DatabricksLogs_CL?api-version=2023-01-01" + events_endpoint = f"{dce_endpoint}/dataCollectionRules/{common_params['dcr_immutable_id']}/streams/Custom-DatabricksEvents_CL?api-version=2023-01-01" + + common_params["endpoints"] = { + "metrics": metrics_endpoint, + "logs": logs_endpoint, + "events": events_endpoint + } + + return common_params + + +def unix_to_iso(timestamp: int) -> str: + """Convert Unix timestamp in milliseconds/seconds to ISO format string.""" + ts = float(timestamp) + # If timestamp is unusually large, assume milliseconds + if ts > 1e12: + ts /= 1000 + dt = datetime.fromtimestamp(ts, tz=timezone.utc) + return dt.isoformat().replace("+00:00", "Z") + +def timestamp_in_unix_milliseconds(timestamp) -> int: + """Convert datetime to Unix timestamp in milliseconds.""" + if isinstance(timestamp, datetime): + return int(timestamp.timestamp() * 1000) + return int(timestamp) + +def get_status(status_display: str) -> str: + """Map pipeline status to appropriate status level.""" + status_lower = status_display.lower() + if status_lower in ['failed', 'error']: + return 'error' + elif status_lower in ['running', 'starting']: + return 'info' + elif status_lower in ['completed', 'success']: + return 'ok' + else: + return 'warn' + +def serialize_datetime(data): + if isinstance(data, dict): + return { + key: serialize_datetime(value) + for key, value in data.items() + } + elif isinstance(data, list): + return [serialize_datetime(item) for item in data] + elif isinstance(data, datetime): + return data.isoformat() + else: + return data + +def filter_null_fields(data): + if isinstance(data, dict): + return { + key: filter_null_fields(value) + for key, value in data.items() + if value is not None + } + elif isinstance(data, list): + return [filter_null_fields(item) for item in data if item is not None] + else: + return data + +def enforce_schema(data, schema, path = "root"): + # Nothing to enforce. + if schema is None or data is None: + return data + + + schema_type = schema.get("type") + if not schema_type: + raise ValueError(f"Failed to get type of the object at {path}.") + + # Validate dictionary + if isinstance(data, dict): + if schema_type != "object": + raise ValueError(f"Expected object at {path}, got {type(data).__name__}") + props = schema.get("properties", {}) + required_keys = schema.get("required", []) + additional_properties = schema.get("additionalProperties", False) + + # Validate defined properties + for k, v in props.items(): + if k in data: + data[k] = enforce_schema(data[k], v, f"{path}.{k}") + elif k in required_keys: + raise ValueError(f"Missing required field '{k}' at {path}") + + # Handle additional properties + for k, v in data.items(): + if k not in props: # This is an additional property + if additional_properties is False: + raise ValueError(f"Additional property '{k}' not allowed at {path}") + elif additional_properties is True: + # Allow any additional property, no validation + pass + elif isinstance(additional_properties, dict): + # Handle oneOf for additional properties + if "oneOf" in additional_properties: + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + } + + for sub_schema in additional_properties["oneOf"]: + expected_type = type_map.get(sub_schema.get("type")) + if expected_type and isinstance(v, expected_type): + data[k] = enforce_schema(v, sub_schema, f"{path}.{k}") + break + else: + raise ValueError( + f"Additional property '{k}' at {path} does not match any oneOf schema" + ) + else: + data[k] = enforce_schema(v, additional_properties, f"{path}.{k}") + + return data + + # Validate list + elif isinstance(data, list): + if schema_type != "array": + raise ValueError(f"Expected array at {path}, got {type(data).__name__}") + items_schema = schema.get("items", {}) + return [enforce_schema(item, items_schema, f"{path}[{i}]") for i, item in enumerate(data)] + + # Handle string + elif isinstance(data, str): + if schema_type != "string": + raise ValueError(f"Expected string at {path}, got {type(data).__name__}") + acceptable_values = schema.get("enum", []) + if acceptable_values and data not in acceptable_values: + raise ValueError(f"Invalid value at {path}: {data}. Allowed: {acceptable_values}") + max_length = schema.get("maxLength") + if max_length and len(data) > max_length: + return data[:max_length] + return data + + # Handle datetime + elif isinstance(data, datetime): + if schema_type == "string": + return data.isoformat() + elif schema_type == "integer": + return data.timestamp() + else: + raise ValueError(f"Cannot convert datetime to {schema_type}") + + # Handle integer + elif isinstance(data, int): + if schema_type == "integer": + return data + elif schema_type == "number": + return float(data) + elif schema_type == "string" and schema.get("format") == "date-time": + return unix_to_iso(data) + else: + raise ValueError(f"Cannot convert integer to {schema_type}") + + elif isinstance(data, float): + if schema_type != "number": + raise ValueError(f"Expected number at {path}, got {type(data).__name__}") + return data + elif isinstance(data, bool): + if schema_type != "boolean": + raise ValueError(f"Expected boolean at {path}, got {type(data).__name__}") + return data + return data + +def create_valid_json_or_fail_with_error(data, schema): + data = serialize_datetime(data) + data = filter_null_fields(data) + data = enforce_schema(data, schema) + return json.dumps(data) + +# ================================================================================ +# HTTP Layer +# ================================================================================ + +# Global session for connection pooling +session: Optional[requests.Session] = None + +class HTTPClient: + """ + HTTP client for batched POST requests using a persistent session. + + Input: Spark DataFrame with columns: + - endpoint (StringType): Target URL. + - header (StringType, JSON-encoded): HTTP headers. + - payload (binary data): Serialized request body. + """ + + def __init__(self, max_retry_duration_sec: int = 300, request_timeout_sec: int = 30): + """ + Initialize the HTTP client. + + Args: + max_retry_duration_sec: Maximum time in seconds to retry requests with exponential backoff + request_timeout_sec: Timeout in seconds for a single request + """ + self.max_retry_duration_sec = max_retry_duration_sec + self.request_timeout_sec = request_timeout_sec + + + def get_session(self) -> requests.Session: + """ + Get the global session instance. If not present, create a new one. + + Returns: + session: The global session instance + """ + global session + if session is None: + session = requests.Session() + return session + + def _make_request_with_retry(self, url: str, headers: Dict[str, str], payload: bytes): + """ + Make a POST request to the provided url. + + Args: + url: The endpoint URL + headers: Request headers + payload: Request payload + + Throws: + Exception: If the request fails and the retries are exhausted. + """ + # Compress payload + compressed_payload = gzip.compress(payload) + headers['Content-Encoding'] = 'gzip' + + response = None + try: + response = self.get_session().post( + url, + headers=headers, + data=compressed_payload, + timeout=self.request_timeout_sec + ) + response.raise_for_status() + print(f"Successfully sent request to URL: {url}, Payload: {payload.decode('utf-8')}, Response: {response.text}") + except Exception as e: + response_text = "No response" + if response is not None: + try: + response_text = str(response.json()) + except: + response_text = response.text if hasattr(response, 'text') else "Unable to read response" + print(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") + raise type(e)(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") from e + + def post(self, http_request_specs_df) -> None: + """ + Make POST requests for each row in the DataFrame. + Serially makes requests for each row in the DataFrame. + + Args: + http_request_specs_df: Spark DataFrame with columns 'endpoint', 'header', 'payloadBytes' + """ + + for row in http_request_specs_df.collect(): + try: + headers = json.loads(getattr(row, 'header', '{}')) + retry_wrapper = retry( + stop=stop_after_delay(self.max_retry_duration_sec), + wait=wait_exponential(multiplier=1, min=1, max=10), + reraise=True + ) + retry_wrapper(self._make_request_with_retry)(row.endpoint, headers, row.payloadBytes) + except Exception as e: + print(f"ERROR: {str(e)}") + continue # Continue with other requests regardless of success/failure + + +# ================================================================================ +# CONVERSION LAYER +# ================================================================================ + +class AzureMonitorMetricsConverter: + """Converter class to convert metrics to Azure Monitor format.""" + + def create_metric( + self, + metric_name: str, + metric_value: float, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """Create an Azure Monitor metric in the proper format. + + Args: + metric_name: Name of the metric (e.g., "pipeline.run.execution_time_seconds") + metric_value: Numeric value for the gauge metric + tags: Dictionary of tags (e.g., {"pipeline_id": "123", "phase": "execution"}) + timestamp: Unix timestamp in milliseconds for the metric + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Azure Monitor metric + + Raises: + ValueError if the fields are of unsupported types. + """ + # Enforce the schema + return create_valid_json_or_fail_with_error({ + "TimeGenerated": unix_to_iso(timestamp), + "metric_name": metric_name, + "metric_value": metric_value, + "tags": tags, + "timestamp": timestamp, + "additional_attributes": additional_attributes + }, METRICS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for metrics.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("metrics", regexp_replace(col("metrics"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("metrics").alias("batch_metrics")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_metrics)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + + +class AzureMonitorEventsConverter: + """Converter class to convert events to Azure Monitor format.""" + + def create_event( + self, + message: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create an Azure Monitor event in the proper format. + + Args: + message: The event message + status: The status of the event (e.g., "success", "error", "warning") + tags: Dictionary of tags + timestamp: Unix timestamp in milliseconds for the event + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Azure Monitor event + + Raises: + ValueError if the fields are of unsupported types. + """ + return create_valid_json_or_fail_with_error({ + "TimeGenerated": unix_to_iso(timestamp), + "message": message, + "status": status, + "tags": tags, + "timestamp": timestamp, + "additional_attributes": additional_attributes, + }, EVENTS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for events.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("events", regexp_replace(col("events"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("events").alias("batch_events")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_events)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + +class AzureMonitorLogsConverter: + """Converter class to convert logs to Azure Monitor format.""" + + def create_log( + self, + message: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create an Azure Monitor log in the proper format. + + Args: + message: The log message + status: The status of the log (e.g., "error", "info", "warning") + tags: Dictionary of tags + timestamp: Unix timestamp in milliseconds for the log + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Azure Monitor log + + Raises: + ValueError if the fields are of unsupported types. + """ + return create_valid_json_or_fail_with_error({ + "TimeGenerated": unix_to_iso(timestamp), + "message": message, + "status": status, + "tags": tags, + "timestamp": timestamp, + "additional_attributes": additional_attributes, + }, LOGS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for logs.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("logs", regexp_replace(col("logs"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("logs").alias("batch_logs")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_logs)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + +# ================================================================================ +# INFERENCE LAYER +# ================================================================================ + +def convert_row_to_error_log(row): + """Convert a row to error log format.""" + params = { + "message": str(getattr(row, "message", "")), + "status": "error", + "tags": { + "pipeline_id": getattr(row, 'pipeline_id', ''), + "pipeline_run_id": getattr(row, 'pipeline_run_id', ''), + "table_name": getattr(row, 'table_name', ''), + "flow_name": getattr(row, 'flow_name', ''), + "level": "error" + }, + "timestamp": timestamp_in_unix_milliseconds(row.event_timestamp), + "additional_attributes": { + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "error": getattr(row, "error", None), + } + } + return _log_converter.create_log(**params) + +def convert_row_to_table_metrics(row): + """Convert a row to table metrics format.""" + # Base tags for all metrics + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "table_name": getattr(row, "table_name", ""), + "flow_name": getattr(row, "flow_name", ""), + "source": SOURCE_NAME + } + + # Timestamp for all metrics + timestamp = timestamp_in_unix_milliseconds(row.event_timestamp) + + return [ + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.upserted_rows", + metric_value=getattr(row, "num_upserted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.deleted_rows", + metric_value=getattr(row, "num_deleted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.output_rows", + metric_value=getattr(row, "num_output_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + ] + +def convert_row_to_pipeline_status_event(row): + """Convert a row to pipeline status event format.""" + # Determine pipeline status for message + status_display = row.latest_state.upper() if row.latest_state else 'UNKNOWN' + pipeline_id = getattr(row, "pipeline_id", "") + + params = { + "message": f"Pipeline {pipeline_id} {status_display}", + "status": get_status(status_display), + "tags": { + "pipeline_id": pipeline_id, + "latest_run_id": getattr(row, "pipeline_run_id", ""), + "status": status_display.lower(), + "source": SOURCE_NAME, + "service": SERVICE_NAME + }, + "timestamp": timestamp_in_unix_milliseconds(row.updated_at), + "additional_attributes": { + "pipeline_link": getattr(row, "pipeline_link", None), + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "is_complete": getattr(row, "is_complete", None), + "running_start_time": getattr(row, "running_start_time", None), + "end_time": getattr(row, "end_time", None), + "updated_at": getattr(row, "updated_at", None) , + "latest_error_log_message": getattr(row, "latest_error_log_message", None), + "latest_error_message": getattr(row, "latest_error_message", None), + } + } + return _events_converter.create_event(**params) + +def convert_row_to_pipeline_metrics(row): + """Convert a row to pipeline metrics format.""" + def has_attr(obj, attr): + return hasattr(obj, attr) and getattr(obj, attr) is not None + + if not has_attr(row, "queued_time") or not has_attr(row, "create_time"): + return [] + + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "source": SOURCE_NAME + } + metrics = [] + timestamp = timestamp_in_unix_milliseconds(getattr(row, "create_time", None)) + + end_time = getattr(row, "end_time", None) or datetime.now(timezone.utc) + + # Starting seconds: queued_time - create_time + starting_seconds = (row.queued_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.starting_seconds", + metric_value=starting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "starting"}, + timestamp=timestamp + )) + + # Seconds waiting for resources: initialization_start_time - queued_time + if not has_attr(row, "initialization_start_time"): + return metrics + waiting_seconds = (row.initialization_start_time - row.queued_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.waiting_for_resources_seconds", + metric_value=waiting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "waiting"}, + timestamp=timestamp + )) + + # Initialization seconds: running_start_time - initialization_start_time + if not has_attr(row, "running_start_time"): + return metrics + initialization_seconds = (row.running_start_time - row.initialization_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.initialization_seconds", + metric_value=initialization_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "initialization"}, + timestamp=timestamp + )) + + # Running seconds: end_time - running_start_time + running_seconds = (end_time - row.running_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.running_seconds", + metric_value=running_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "running"}, + timestamp=timestamp + )) + + # Total seconds: end_time - create_time + total_seconds = (end_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.total_seconds", + metric_value=total_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "total"}, + timestamp=timestamp + )) + + return metrics + +# ================================================================================ +# MAIN +# ================================================================================ + +# Source streams +event_logs_bronze = "event_logs_bronze" +pipeline_runs_status = "pipeline_runs_status" + + +http_client = None +def getClient(config): + """Global HTTP client getter.""" + global http_client + if http_client is None: + http_client = HTTPClient( + max_retry_duration_sec=config["max_retry_duration_sec"], + request_timeout_sec=config["request_timeout_sec"] + ) + return http_client + + + +token_details = { + "access_token": None, + "last_load_timestamp": 0 +} +token_lock = threading.Lock() + +def get_access_token(config): + global token_details + now = int(datetime.now(timezone.utc).timestamp()) + if ((now - token_details["last_load_timestamp"]) < config["max_access_token_staleness"]): + return token_details["access_token"] + + # Token does not exist or is stale, fetch a new one. + with token_lock: + if token_details["access_token"] is None or (now - token_details["last_load_timestamp"]) >= config["max_access_token_staleness"]: + token_details = fetch_access_token(config) + return token_details["access_token"] + + +def register_sink_for_pipeline_events(): + @dlt.foreach_batch_sink(name="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_pipeline_status_event, StringType()) + events_df = batch_df.withColumn("events", destination_format_udf(struct("*"))).select("events").filter(col("events").isNotNull()) + http_request_spec = _events_converter.create_http_requests_spec( + events_df, + _global_config["num_rows_per_batch"], + get_azure_header(get_access_token(_global_config)), + _global_config["endpoints"]["events"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + + +def register_sink_for_errors(): + @dlt.foreach_batch_sink(name="send_errors_to_3p_monitoring") + def send_errors_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_error_log, StringType()) + logs_df = batch_df.withColumn("logs", destination_format_udf(struct("*"))).select("logs").filter(col("logs").isNotNull()) + http_request_spec = _log_converter.create_http_requests_spec( + logs_df, + _global_config["num_rows_per_batch"], + get_azure_header(get_access_token(_global_config)), + _global_config["endpoints"]["logs"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_errors_to_3p_monitoring") + def send_errors_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze).filter("error IS NOT NULL OR level = 'ERROR'") + +def register_sink_for_pipeline_metrics(): + @dlt.foreach_batch_sink(name="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_pipeline_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()) + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_azure_header(get_access_token(_global_config)), + _global_config["endpoints"]["metrics"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + +def register_sink_for_table_metrics(): + @dlt.foreach_batch_sink(name="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_table_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()) + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_azure_header(get_access_token(_global_config)), + _global_config["endpoints"]["metrics"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze) \ + .filter("table_name is not null AND details:flow_progress.metrics is not null AND event_type = 'flow_progress'") \ + .selectExpr( + "pipeline_id", + "pipeline_run_id", + "table_name", + "flow_name", + "event_timestamp", + "details:flow_progress.metrics.num_upserted_rows::bigint as num_upserted_rows", + "details:flow_progress.metrics.num_deleted_rows::bigint as num_deleted_rows", + "(details:flow_progress.metrics.num_upserted_rows::bigint + details:flow_progress.metrics.num_deleted_rows::bigint) as num_output_rows" + ) \ + .filter("num_upserted_rows is not null OR num_deleted_rows is not null OR num_output_rows is not null") + + +# ================================================================================ +# MAIN INITIALIZATION +# ================================================================================ + +# Initialize global configuration and register sinks. +if getParam(spark.conf, "destination") == "azuremonitor": + initialize_global_config(spark.conf) + register_sink_for_errors() + register_sink_for_pipeline_events() + register_sink_for_table_metrics() + register_sink_for_pipeline_metrics() \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/third_party_sinks/datadog_sink.py b/contrib/dbx_ingestion_monitoring/third_party_sinks/datadog_sink.py new file mode 100644 index 0000000..d3a47c3 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/third_party_sinks/datadog_sink.py @@ -0,0 +1,1093 @@ +""" +Datadog Sink for Monitoring ETL Pipeline. + +For configuration details, refer to README-third-party-monitoring.md +""" + +import json +import logging +import requests +import gzip +from typing import List, Dict, Any, Optional +from tenacity import retry, stop_after_delay, wait_exponential +from datetime import datetime, timezone +from pyspark.sql import SparkSession +from pyspark.sql.types import StringType, ArrayType +from pyspark.sql.functions import lit, col, collect_list, concat, expr, udf, struct, explode, regexp_replace +import dlt + +# Global Configuration. +SERVICE_NAME = "databricks-lakeflow-connect" +SOURCE_NAME = "databricks" +_global_config = None +_log_converter = None +_events_converter = None +_metrics_converter = None + +# Global Schemas +# Schema Validation enforces field types, trims oversized strings (maxLength), +# converts datetime objects to appropriate formats, validates required fields, +# and handles oneOf constraints for additionalProperties with type safety. +METRICS_SCHEMA = { + "type": "object", + "required": ["metric", "points", "type"], + "properties": { + "metric": { + "type": "string", + "description": "The name of the timeseries." + }, + "type": { + "type": "integer", + "enum": [3], + "description": "The type of metric. 0=unspecified, 1=count, 2=rate, 3=gauge." + }, + "points": { + "type": "array", + "minItems": 1, + "items": { + "type": "object", + "required": ["timestamp", "value"], + "properties": { + "timestamp": { + "type": "integer", + "description": "The timestamp should be in seconds, not more than 10 minutes in the future or more than 1 hour in the past." + }, + "value": { + "type": "number", + "description": "The numeric value format should be a 64bit float gauge-type value." + } + } + } + }, + "tags": { + "type": "array", + "items": { + "type": "string" + }, + "description": "A list of tags associated with the metric." + } + } +} + +LOGS_SCHEMA = { + "type": "object", + "required": ["message", "ddsource", "ddtags", "timestamp", "status", "service"], + "properties": { + "message": { + "type": "string", + "description": "The message reserved attribute of your log." + }, + "ddsource": { + "type": "string", + "description": "The integration name associated with your log: the technology from which the log originated." + }, + "ddtags": { + "type": "string", + "description": "Tags associated with your logs." + }, + "timestamp": { + "type": "integer", + "description": "Unix timestamp for the log entry." + }, + "status": { + "type": "string", + "description": "The status/level of the log entry." + }, + "service": { + "type": "string", + "description": "The name of the application or service generating the log events." + } + }, + "additionalProperties": True +} + +EVENTS_SCHEMA = { + "type": "object", + "required": ["data"], + "properties": { + "data": { + "type": "object", + "required": ["type", "attributes"], + "properties": { + "type": { + "type": "string", + "enum": ["event"] + }, + "attributes": { + "type": "object", + "required": ["category", "title", "message", "timestamp", "tags", "attributes"], + "properties": { + "category": { + "type": "string", + "enum": ["alert"] + }, + "title": { + "type": "string", + "maxLength": 500 + }, + "message": { + "type": "string", + "maxLength": 2000 + }, + "timestamp": { + "type": "string", + "format": "date-time", + "description": "ISO 8601 timestamp, must be within 18 hours." + }, + "tags": { + "type": "array", + "maxItems": 100, + "items": { + "type": "string" + } + }, + "attributes": { + "type": "object", + "required": ["status", "custom"], + "properties": { + "status": { + "type": "string", + "enum": ["warn", "error", "ok"] + }, + "custom": { + "type": "object", + "description": "Custom key-value attributes for the event.", + "additionalProperties": { + "type": ["string", "number", "boolean", "null"] + } + } + } + } + } + } + } + } + } +} + +# ================================================================================ +# UTILITIES +# ================================================================================ + +def get_datadog_headers(api_key: str): + """Get headers for the Datadog API.""" + return {"Content-Type": "application/json", "DD-API-KEY": api_key} + + +def initialize_global_config(spark_conf): + """Initialize global configuration from Spark configuration.""" + global _global_config, _log_converter, _events_converter, _metrics_converter + + _global_config = getThirdPartySinkConfigFromSparkConfig(spark_conf) + _log_converter = DatadogLogsConverter() + _events_converter = DatadogEventsConverter() + _metrics_converter = DatadogMetricsConverter() + +def getParam(spark_conf, key: str, default=None): + value = spark_conf.get(key, default) + if value == "" or value is None: + return None + return value + +def getThirdPartySinkConfigFromSparkConfig(spark_conf): + """ + Extract and merge configuration from Spark configuration and secret scope. + + This function extracts configuration variables from Spark configuration and merges + them with key-value pairs from a secret scope (if provided) to build common_params. + Secret store values take precedence over Spark configuration values when both exist. + + Args: + spark_conf: Spark configuration object containing required parameters + + Returns: + dict: Merged configuration parameters with secrets taking precedence + + The function looks for a 'secrets_scope' parameter in Spark config. If found, + it will retrieve all secrets from that scope and merge them with the base + configuration, giving preference to secret values. + """ + destination = getParam(spark_conf, "destination") + if destination is None: + raise ValueError("Destination must be provided for third party sinks.") + + common_params = { + "destination": destination, + "num_rows_per_batch": int(spark_conf.get("num_rows_per_batch", "100")), + "max_retry_duration_sec": int(spark_conf.get("max_retry_duration_sec", "300")), + "request_timeout_sec": int(spark_conf.get("request_timeout_sec", "30")), + } + + api_key = getParam(spark_conf, "api_key") + + scope = getParam(spark_conf, "secrets_scope") + if scope is not None: + secrets = { + s.key: dbutils.secrets.get(scope=scope, key=s.key) + for s in dbutils.secrets.list(scope) + } + common_params.update(secrets) + if "api_key" in secrets: + api_key = secrets["api_key"] + + if api_key is None: + raise ValueError(f"API key is required for {destination} destination") + common_params["api_key"] = api_key + + host_name = getParam(spark_conf, "host_name") + + metrics_endpoint = getParam(spark_conf, "endpoints.metrics") + logs_endpoint = getParam(spark_conf, "endpoints.logs") + events_endpoint = getParam(spark_conf, "endpoints.events") + + if not all([metrics_endpoint, logs_endpoint, events_endpoint]): + if host_name is None: + raise ValueError( + "Either 'host_name' must be provided to auto-generate endpoints, " + "or all three endpoints (endpoints.metrics, endpoints.logs, endpoints.events) " + "must be explicitly configured." + ) + + if metrics_endpoint is None: + metrics_endpoint = f"https://api.{host_name}/api/v2/series" + if logs_endpoint is None: + logs_endpoint = f"https://http-intake.logs.{host_name}/api/v2/logs" + if events_endpoint is None: + events_endpoint = f"https://event-management-intake.{host_name}/api/v2/events" + + common_params["endpoints"] = { + "metrics": metrics_endpoint, + "logs": logs_endpoint, + "events": events_endpoint, + } + + return common_params + + +def unix_to_iso(timestamp: int) -> str: + """Convert Unix timestamp in milliseconds/seconds to ISO format string.""" + ts = float(timestamp) + # If timestamp is unusually large, assume milliseconds + if ts > 1e12: + ts /= 1000 + dt = datetime.fromtimestamp(ts, tz=timezone.utc) + return dt.isoformat().replace("+00:00", "Z") + +def timestamp_in_unix_milliseconds(timestamp) -> int: + """Convert datetime to Unix timestamp in milliseconds.""" + if isinstance(timestamp, datetime): + return int(timestamp.timestamp() * 1000) + return int(timestamp) + +def get_status(status_display: str) -> str: + """Map pipeline status to appropriate status level.""" + status_lower = status_display.lower() + if status_lower in ['failed', 'error']: + return 'error' + elif status_lower in ['running', 'starting', 'completed', 'success']: + return 'ok' + else: + return 'warn' + +def serialize_datetime(data): + if isinstance(data, dict): + return { + key: serialize_datetime(value) + for key, value in data.items() + } + elif isinstance(data, list): + return [serialize_datetime(item) for item in data] + elif isinstance(data, datetime): + return data.isoformat() + else: + return data + +def filter_null_fields(data): + if isinstance(data, dict): + return { + key: filter_null_fields(value) + for key, value in data.items() + if value is not None + } + elif isinstance(data, list): + return [filter_null_fields(item) for item in data if item is not None] + else: + return data + +def enforce_schema(data, schema, path = "root"): + # Nothing to enforce. + if schema is None or data is None: + return data + + # Handle oneOf before checking for type + if "oneOf" in schema: + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + "object": dict, + "array": list, + } + for sub_schema in schema["oneOf"]: + sub_type = sub_schema.get("type") + expected_python_type = type_map.get(sub_type) + if expected_python_type and isinstance(data, expected_python_type): + return enforce_schema(data, sub_schema, path) + raise ValueError(f"Value at {path} does not match any oneOf schema options") + + schema_type = schema.get("type") + if not schema_type: + raise ValueError(f"Failed to get type of the object at {path}.") + + # Handle array of types (e.g., ["string", "number", "boolean", "null"]) + if isinstance(schema_type, list): + # Check if data matches any of the allowed types + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + "null": type(None), + } + for allowed_type in schema_type: + expected_python_type = type_map.get(allowed_type) + if expected_python_type and isinstance(data, expected_python_type): + # Use the matched type for further validation + schema_type = allowed_type + break + else: + raise ValueError(f"Value at {path} does not match any allowed types: {schema_type}") + + # Validate dictionary + if isinstance(data, dict): + if schema_type != "object": + raise ValueError(f"Expected object at {path}, got {type(data).__name__}") + props = schema.get("properties", {}) + required_keys = schema.get("required", []) + additional_properties = schema.get("additionalProperties", False) + + # Validate defined properties + for k, v in props.items(): + if k in data: + data[k] = enforce_schema(data[k], v, f"{path}.{k}") + elif k in required_keys: + raise ValueError(f"Missing required field '{k}' at {path}") + + # Handle additional properties + for k, v in data.items(): + if k not in props: # This is an additional property + if additional_properties is False: + raise ValueError(f"Additional property '{k}' not allowed at {path}") + elif additional_properties is True: + # Allow any additional property, no validation + pass + elif isinstance(additional_properties, dict): + # Handle oneOf for additional properties + if "oneOf" in additional_properties: + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + } + + for sub_schema in additional_properties["oneOf"]: + expected_type = type_map.get(sub_schema.get("type")) + if expected_type and isinstance(v, expected_type): + data[k] = enforce_schema(v, sub_schema, f"{path}.{k}") + break + else: + raise ValueError( + f"Additional property '{k}' at {path} does not match any oneOf schema" + ) + else: + data[k] = enforce_schema(v, additional_properties, f"{path}.{k}") + + return data + + # Validate list + elif isinstance(data, list): + if schema_type != "array": + raise ValueError(f"Expected array at {path}, got {type(data).__name__}") + items_schema = schema.get("items", {}) + return [enforce_schema(item, items_schema, f"{path}[{i}]") for i, item in enumerate(data)] + + # Handle string + elif isinstance(data, str): + if schema_type != "string": + raise ValueError(f"Expected string at {path}, got {type(data).__name__}") + acceptable_values = schema.get("enum", []) + if acceptable_values and data not in acceptable_values: + raise ValueError(f"Invalid value at {path}: {data}. Allowed: {acceptable_values}") + max_length = schema.get("maxLength") + if max_length and len(data) > max_length: + return data[:max_length] + return data + + # Handle datetime + elif isinstance(data, datetime): + if schema_type == "string": + return data.isoformat() + elif schema_type == "integer": + return data.timestamp() + else: + raise ValueError(f"Cannot convert datetime to {schema_type}") + + # Handle integer + elif isinstance(data, int): + if schema_type == "integer" or schema_type == "number": + return data + elif schema_type == "string" and schema.get("format") == "date-time": + return unix_to_iso(data) + else: + raise ValueError(f"Cannot convert integer to {schema_type}") + + elif isinstance(data, float): + if schema_type != "number" and schema_type != "integer": + raise ValueError(f"Expected number at {path}, got {type(data).__name__}") + return data + elif isinstance(data, bool): + if schema_type != "boolean": + raise ValueError(f"Expected boolean at {path}, got {type(data).__name__}") + return data + return data + +def create_valid_json_or_fail_with_error(data, schema): + data = serialize_datetime(data) + data = filter_null_fields(data) + data = enforce_schema(data, schema) + return json.dumps(data) + +# ================================================================================ +# HTTP Layer +# ================================================================================ + +# Global session for connection pooling +session: Optional[requests.Session] = None + +class HTTPClient: + """ + HTTP client for batched POST requests using a persistent session. + + Input: Spark DataFrame with columns: + - endpoint (StringType): Target URL. + - header (StringType, JSON-encoded): HTTP headers. + - payload (binary data): Serialized request body. + """ + + def __init__(self, max_retry_duration_sec: int = 300, request_timeout_sec: int = 30): + """ + Initialize the HTTP client. + + Args: + max_retry_duration_sec: Maximum time in seconds to retry requests with exponential backoff + request_timeout_sec: Timeout in seconds for a single request + """ + self.max_retry_duration_sec = max_retry_duration_sec + self.request_timeout_sec = request_timeout_sec + + + def get_session(self) -> requests.Session: + """ + Get the global session instance. If not present, create a new one. + + Returns: + session: The global session instance + """ + global session + if session is None: + session = requests.Session() + return session + + def _make_request_with_retry(self, url: str, headers: Dict[str, str], payload: bytes): + """ + Make a POST request to the provided url. + + Args: + url: The endpoint URL + headers: Request headers + payload: Request payload + + Throws: + Exception: If the request fails and the retries are exhausted. + """ + # Compress payload + compressed_payload = gzip.compress(payload) + headers['Content-Encoding'] = 'gzip' + + response = None + try: + response = self.get_session().post( + url, + headers=headers, + data=compressed_payload, + timeout=self.request_timeout_sec + ) + if response.status_code >= 400 and response.status_code < 500: + logging.warning(f"Ignoring client-side error for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Response: {response.text}") + else: + response.raise_for_status() + logging.debug(f"Successfully sent request to URL: {url}, Payload: {payload.decode('utf-8')}, Response: {response.text}") + except Exception as e: + response_text = "No response" + if response is not None: + try: + response_text = str(response.json()) + except: + response_text = response.text if hasattr(response, 'text') else "Unable to read response" + logging.error(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") + raise type(e)(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") from e + + def post(self, http_request_specs_df) -> None: + """ + Make POST requests for each row in the DataFrame. + Serially makes requests for each row in the DataFrame. + + Args: + http_request_specs_df: Spark DataFrame with columns 'endpoint', 'header', 'payloadBytes' + """ + rows = http_request_specs_df.collect() + total_requests = len(rows) + logging.info(f"[HTTPClient] Starting to send {total_requests} HTTP request(s)") + + success_count = 0 + failure_count = 0 + + for idx, row in enumerate(rows, 1): + try: + logging.info(f"[HTTPClient] Sending request {idx}/{total_requests} to {row.endpoint}") + headers = json.loads(getattr(row, 'header', '{}')) + retry_wrapper = retry( + stop=stop_after_delay(self.max_retry_duration_sec), + wait=wait_exponential(multiplier=1, min=1, max=10), + reraise=True + ) + retry_wrapper(self._make_request_with_retry)(row.endpoint, headers, row.payloadBytes) + success_count += 1 + logging.info(f"[HTTPClient] Successfully sent request {idx}/{total_requests}") + except Exception as e: + failure_count += 1 + logging.error(f"[HTTPClient] Failed to send request {idx}/{total_requests}: {str(e)}") + continue # Continue with other requests regardless of success/failure + + logging.info(f"[HTTPClient] Completed sending requests: {success_count} succeeded, {failure_count} failed out of {total_requests} total") + + +# ================================================================================ +# CONVERSION LAYER +# ================================================================================ + +class DatadogMetricsConverter: + """Converter class to convert metrics to datadog format.""" + + def create_metric( + self, + metric_name: str, + metric_value: float, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """Create a Datadog Gauge metric in the proper format. + + Args: + metric_name: Name of the metric (e.g., "pipeline.run.execution_time_seconds") + metric_value: Numeric value for the gauge metric + tags: Dictionary of tags (e.g., {"pipeline_id": "123", "phase": "execution"}) + timestamp: Unix timestamp for the metric + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Datadog metric + + Raises: + ValueError if the fields are of unsupported types. + """ + + # Base metric structure for Datadog Gauge (type 3) + metric = { + "metric": metric_name, + "type": 3, # Gauge metric type + "points": [{"timestamp": timestamp, "value": metric_value}], + "tags": [f"{k}:{v}" for k, v in tags.items()] + } + + # Add additional attributes if provided + if additional_attributes: + metric["tags"].extend([f"{k}:{v}" for k, v in additional_attributes.items()]) + + # Enforce the schema + return create_valid_json_or_fail_with_error(metric, METRICS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for metrics.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("metrics", regexp_replace(col("metrics"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("metrics").alias("batch_metrics")) \ + .withColumn("payload", concat(lit('{"series": ['), + expr("concat_ws(',', batch_metrics)"), + lit(']}'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + + +class DatadogEventsConverter: + """Converter class to convert events to datadog format.""" + + + def create_event( + self, + title: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create a Datadog event in the proper format. + + Args: + title: The event title + status: The status of the event (e.g., "ok", "warn", "error") + tags: Dictionary of tags + timestamp: Unix timestamp for the event + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Datadog event + + Raises: + ValueError if the fields are of unsupported types. + """ + event = { + "data": { + "type": "event", + "attributes": { + "category": "alert", + "title": title, + "message": f"Event: {title}", + "timestamp": unix_to_iso(timestamp), + "tags": [f"{k}:{v}" for k, v in tags.items()], # Limit to 100 tags + "attributes": { + "status": status, + "custom": { + "source": SOURCE_NAME, + "service": SERVICE_NAME + } + } + } + } + } + + # Add additional attributes if provided + if additional_attributes: + event["data"]["attributes"]["attributes"]["custom"].update(additional_attributes) + + # Enforce the schema + return create_valid_json_or_fail_with_error(event, EVENTS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for events.""" + return df \ + .withColumn("events", regexp_replace(col("events"), "\n", "")) \ + .withColumn("payloadBytes", col("events").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + +class DatadogLogsConverter: + """Converter class to convert metrics to datadog format.""" + + def create_log( + self, + title: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create a Datadog log in the proper format. + + Args: + title: The log message/title + status: The status of the log (e.g., "error", "info", "warning") + tags: Dictionary of tags + timestamp: Unix timestamp for the log + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Datadog log + + Raises: + ValueError if the fields are of unsupported types. + """ + + # Base log structure for Datadog + log = { + "message": title, + "ddsource": SOURCE_NAME, + "ddtags": ",".join([f"{k}:{v}" for k, v in tags.items()]), + "timestamp": timestamp, + "status": status, + "service": SERVICE_NAME + } + + # Add additional attributes if provided + if additional_attributes: + log.update(additional_attributes) + + # Enforce the schema + return create_valid_json_or_fail_with_error(log, LOGS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for logs.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("logs", regexp_replace(col("logs"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("logs").alias("batch_logs")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_logs)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + + +# ================================================================================ +# INFERENCE LAYER +# ================================================================================ + +def convert_row_to_error_log(row): + """Convert a row to error log format.""" + params = { + "title": str(getattr(row, "error_message", "")), + "status": "error", + "tags": { + "pipeline_id": getattr(row, 'pipeline_id', ''), + "pipeline_run_id": getattr(row, 'pipeline_run_id', ''), + "table_name": getattr(row, 'table_name', ''), + "flow_name": getattr(row, 'flow_name', ''), + "level": "error" + }, + "timestamp": timestamp_in_unix_milliseconds(row.event_timestamp), + "additional_attributes": { + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "error": getattr(row, "error_code", None), + } + } + return _log_converter.create_log(**params) + +def convert_row_to_table_metrics(row): + """Convert a row to table metrics format.""" + # Base tags for all metrics + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "table_name": getattr(row, "table_name", ""), + "flow_name": getattr(row, "flow_name", ""), + "source": SOURCE_NAME + } + + # Timestamp for all metrics + timestamp = timestamp_in_unix_milliseconds(row.event_timestamp) + + return [ + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.upserted_rows", + metric_value=getattr(row, "num_upserted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.deleted_rows", + metric_value=getattr(row, "num_deleted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.output_rows", + metric_value=getattr(row, "num_output_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + ] + +def convert_row_to_pipeline_status_event(row): + """Convert a row to pipeline status event format.""" + # Determine pipeline status for title + status_display = row.latest_state.upper() if row.latest_state else 'UNKNOWN' + pipeline_id = getattr(row, "pipeline_id", "") + + params = { + "title": f"Pipeline {pipeline_id} {status_display}", + "status": get_status(status_display), + "tags": { + "pipeline_id": pipeline_id, + "latest_run_id": getattr(row, "pipeline_run_id", ""), + "status": status_display.lower(), + "source": SOURCE_NAME, + "service": SERVICE_NAME + }, + "timestamp": timestamp_in_unix_milliseconds(row.updated_at), + "additional_attributes": { + "pipeline_link": getattr(row, "pipeline_link", None), + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "is_complete": getattr(row, "is_complete", None), + "running_start_time": getattr(row, "running_start_time", None), + "end_time": getattr(row, "end_time", None), + "updated_at": getattr(row, "updated_at", None) , + "latest_error_log_message": getattr(row, "latest_error_log_message", None), + "latest_error_message": getattr(row, "latest_error_message", None), + } + } + return _events_converter.create_event(**params) + +def convert_row_to_pipeline_metrics(row): + """Convert a row to pipeline metrics format.""" + def has_attr(obj, attr): + return hasattr(obj, attr) and getattr(obj, attr) is not None + + if not has_attr(row, "queued_time") or not has_attr(row, "create_time"): + return [] + + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "source": SOURCE_NAME + } + metrics = [] + timestamp = timestamp_in_unix_milliseconds(getattr(row, "create_time", None)) + + end_time = getattr(row, "end_time", None) or datetime.now(timezone.utc).replace(tzinfo=None) + + # Starting seconds: queued_time - create_time + starting_seconds = (row.queued_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.starting_seconds", + metric_value=starting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "starting"}, + timestamp=timestamp + )) + + # Seconds waiting for resources: initialization_start_time - queued_time + if not has_attr(row, "initialization_start_time"): + return metrics + waiting_seconds = (row.initialization_start_time - row.queued_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.waiting_for_resources_seconds", + metric_value=waiting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "waiting"}, + timestamp=timestamp + )) + + # Initialization seconds: running_start_time - initialization_start_time + if not has_attr(row, "running_start_time"): + return metrics + initialization_seconds = (row.running_start_time - row.initialization_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.initialization_seconds", + metric_value=initialization_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "initialization"}, + timestamp=timestamp + )) + + # Running seconds: end_time - running_start_time + running_seconds = (end_time - row.running_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.running_seconds", + metric_value=running_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "running"}, + timestamp=timestamp + )) + + # Total seconds: end_time - create_time + total_seconds = (end_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.total_seconds", + metric_value=total_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "total"}, + timestamp=timestamp + )) + + return metrics + +# ================================================================================ +# MAIN +# ================================================================================ + +# Source streams +event_logs_bronze = "event_logs_bronze" +pipeline_runs_status = "pipeline_runs_status" + + +http_client = None +def getClient(config): + """Global HTTP client getter.""" + global http_client + if http_client is None: + http_client = HTTPClient( + max_retry_duration_sec=config["max_retry_duration_sec"], + request_timeout_sec=config["request_timeout_sec"] + ) + return http_client + +def register_sink_for_pipeline_events(): + @dlt.foreach_batch_sink(name="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_3p_monitoring(batch_df, batch_id): + input_count = batch_df.count() + logging.info(f"[Pipeline Events] Processing batch {batch_id} with {input_count} rows") + + destination_format_udf = udf(convert_row_to_pipeline_status_event, StringType()) + events_df = batch_df.withColumn("events", destination_format_udf(struct("*"))).select("events").filter(col("events").isNotNull()).cache() + + events_count = events_df.count() + logging.info(f"[Pipeline Events] Converted {events_count} events from {input_count} input rows") + + if events_count == 0: + logging.info(f"[Pipeline Events] Skipping batch {batch_id} - no events to send") + return + + http_request_spec = _events_converter.create_http_requests_spec( + events_df, + _global_config["num_rows_per_batch"], + get_datadog_headers(_global_config["api_key"]), + _global_config["endpoints"]["events"] + ).cache() + + request_count = http_request_spec.count() + logging.info(f"[Pipeline Events] Sending {request_count} HTTP requests for batch {batch_id}") + getClient(_global_config).post(http_request_spec) + logging.info(f"[Pipeline Events] Completed batch {batch_id}") + + @dlt.append_flow(target="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + + +def register_sink_for_errors(): + @dlt.foreach_batch_sink(name="send_errors_to_3p_monitoring") + def send_errors_to_3p_monitoring(batch_df, batch_id): + input_count = batch_df.count() + logging.info(f"[Errors] Processing batch {batch_id} with {input_count} rows") + + destination_format_udf = udf(convert_row_to_error_log, StringType()) + logs_df = batch_df.withColumn("logs", destination_format_udf(struct("*"))).select("logs").filter(col("logs").isNotNull()).cache() + + logs_count = logs_df.count() + logging.info(f"[Errors] Converted {logs_count} error logs from {input_count} input rows") + + if logs_count == 0: + logging.info(f"[Errors] Skipping batch {batch_id} - no error logs to send") + return + + http_request_spec = _log_converter.create_http_requests_spec( + logs_df, + _global_config["num_rows_per_batch"], + get_datadog_headers(_global_config["api_key"]), + _global_config["endpoints"]["logs"] + ).cache() + + request_count = http_request_spec.count() + logging.info(f"[Errors] Sending {request_count} HTTP requests for batch {batch_id}") + getClient(_global_config).post(http_request_spec) + logging.info(f"[Errors] Completed batch {batch_id}") + + @dlt.append_flow(target="send_errors_to_3p_monitoring") + def send_errors_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze).filter("error_message IS NOT NULL OR level = 'ERROR'") + +def register_sink_for_pipeline_metrics(): + @dlt.foreach_batch_sink(name="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_3p_monitoring(batch_df, batch_id): + input_count = batch_df.count() + logging.info(f"[Pipeline Metrics] Processing batch {batch_id} with {input_count} rows") + + destination_format_udf = udf(convert_row_to_pipeline_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()).cache() + + metrics_count = metrics_df.count() + logging.info(f"[Pipeline Metrics] Converted {metrics_count} metrics from {input_count} input rows") + + if metrics_count == 0: + logging.info(f"[Pipeline Metrics] Skipping batch {batch_id} - no metrics to send") + return + + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_datadog_headers(_global_config["api_key"]), + _global_config["endpoints"]["metrics"] + ).cache() + + request_count = http_request_spec.count() + logging.info(f"[Pipeline Metrics] Sending {request_count} HTTP requests for batch {batch_id}") + getClient(_global_config).post(http_request_spec) + logging.info(f"[Pipeline Metrics] Completed batch {batch_id}") + + @dlt.append_flow(target="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + +def register_sink_for_table_metrics(): + @dlt.foreach_batch_sink(name="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_3p_monitoring(batch_df, batch_id): + input_count = batch_df.count() + logging.info(f"[Table Metrics] Processing batch {batch_id} with {input_count} rows") + + destination_format_udf = udf(convert_row_to_table_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()).cache() + + metrics_count = metrics_df.count() + logging.info(f"[Table Metrics] Converted {metrics_count} metrics from {input_count} input rows") + + if metrics_count == 0: + logging.info(f"[Table Metrics] Skipping batch {batch_id} - no metrics to send") + return + + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_datadog_headers(_global_config["api_key"]), + _global_config["endpoints"]["metrics"] + ).cache() + + request_count = http_request_spec.count() + logging.info(f"[Table Metrics] Sending {request_count} HTTP requests for batch {batch_id}") + getClient(_global_config).post(http_request_spec) + logging.info(f"[Table Metrics] Completed batch {batch_id}") + + @dlt.append_flow(target="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze) \ + .filter("table_name is not null AND details:flow_progress.metrics is not null AND event_type = 'flow_progress'") \ + .selectExpr( + "pipeline_id", + "pipeline_run_id", + "table_name", + "flow_name", + "event_timestamp", + "details:flow_progress.metrics.num_upserted_rows::bigint as num_upserted_rows", + "details:flow_progress.metrics.num_deleted_rows::bigint as num_deleted_rows", + "(details:flow_progress.metrics.num_upserted_rows::bigint + details:flow_progress.metrics.num_deleted_rows::bigint) as num_output_rows" + ) \ + .filter("num_upserted_rows is not null OR num_deleted_rows is not null OR num_output_rows is not null") + +# ================================================================================ +# MAIN INITIALIZATION +# ================================================================================ + +# Initialize global configuration and register sinks. +if getParam(spark.conf, "destination") == "datadog": + initialize_global_config(spark.conf) + register_sink_for_errors() + register_sink_for_pipeline_events() + register_sink_for_table_metrics() + register_sink_for_pipeline_metrics() \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/third_party_sinks/newrelic_sink.py b/contrib/dbx_ingestion_monitoring/third_party_sinks/newrelic_sink.py new file mode 100644 index 0000000..940a8e3 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/third_party_sinks/newrelic_sink.py @@ -0,0 +1,951 @@ +""" +NewRelic Sink for Monitoring ETL Pipeline. + +For configuration details, refer to README-third-party-monitoring.md +""" + +import json +import requests +import gzip +from typing import List, Dict, Any, Optional +from tenacity import retry, stop_after_delay, wait_exponential +from datetime import datetime, timezone +from pyspark.sql import SparkSession +from pyspark.sql.types import StringType, ArrayType +from pyspark.sql.functions import lit, col, collect_list, concat, expr, udf, struct, explode, regexp_replace +import dlt + +# Global Configuration. +SERVICE_NAME = "databricks-lakeflow-connect" +SOURCE_NAME = "databricks" +_global_config = None +_log_converter = None +_events_converter = None +_metrics_converter = None + +# Global Schemas +# Schema Validation enforces field types, trims oversized strings (maxLength), +# converts datetime objects to appropriate formats, validates required fields, +# and handles oneOf constraints for additionalProperties with type safety. +METRICS_SCHEMA = { + "type": "object", + "required": ["name", "value", "timestamp"], + "properties": { + "name": { + "type": "string", + "maxLength": 255 + }, + "value": { + "oneOf": [ + {"type": "number"}, + {"type": "object"} + ] + }, + "timestamp": {"type": "integer"}, + "interval.ms": { + "type": "integer", + "minimum": 1 + }, + "type": { + "type": "string", + "enum": ["gauge"] + }, + "attributes": { + "type": "object", + "additionalProperties": { + "oneOf": [ + {"type": "string"}, + {"type": "number"}, + {"type": "boolean"} + ] + } + } + }, + "additionalProperties": False +} + +LOGS_SCHEMA = { + "type": "object", + "required": ["timestamp", "message"], + "properties": { + "timestamp": {"type": "integer"}, + "message": {"type": "string"} + }, + "additionalProperties": True +} + +EVENTS_SCHEMA = { + "type": "object", + "required": ["timestamp"], + "maxProperties": 255, + "properties": { + "timestamp": {"type": "integer"} + }, + "additionalProperties": { + "oneOf": [ + { + "type": "string", + "maxLength": 4096 + }, + {"type": "number"}, + {"type": "boolean"} + ] + } +} + +# ================================================================================ +# UTILITIES +# ================================================================================ + +def get_newrelic_headers(api_key: str): + """Get headers for the NewRelic API.""" + return {"Content-Type": "application/json", "Api-key": api_key} + + +def initialize_global_config(spark_conf): + """Initialize global configuration from Spark configuration.""" + global _global_config, _log_converter, _events_converter, _metrics_converter + + _global_config = getThirdPartySinkConfigFromSparkConfig(spark_conf) + _log_converter = NewRelicLogsConverter() + _events_converter = NewRelicEventsConverter() + _metrics_converter = NewRelicMetricsConverter() + +def getParam(spark_conf, key: str, default=None): + value = spark_conf.get(key, default) + if value == "" or value is None: + return None + return value + +def getThirdPartySinkConfigFromSparkConfig(spark_conf): + """ + Extract and merge configuration from Spark configuration and secret scope. + + This function extracts configuration variables from Spark configuration and merges + them with key-value pairs from a secret scope (if provided) to build common_params. + Secret store values take precedence over Spark configuration values when both exist. + + Args: + spark_conf: Spark configuration object containing required parameters + + Returns: + dict: Merged configuration parameters with secrets taking precedence + + The function looks for a 'secrets_scope' parameter in Spark config. If found, + it will retrieve all secrets from that scope and merge them with the base + configuration, giving preference to secret values. + """ + destination = getParam(spark_conf, "destination") + if destination is None: + raise ValueError("Destination must be provided for third party sinks.") + + common_params = { + "destination": destination, + "num_rows_per_batch": int(spark_conf.get("num_rows_per_batch", "100")), + "max_retry_duration_sec": int(spark_conf.get("max_retry_duration_sec", "300")), + "request_timeout_sec": int(spark_conf.get("request_timeout_sec", "30")), + } + + api_key = getParam(spark_conf, "api_key") + + scope = getParam(spark_conf, "secrets_scope") + if scope is not None: + secrets = { + s.key: dbutils.secrets.get(scope=scope, key=s.key) + for s in dbutils.secrets.list(scope) + } + common_params.update(secrets) + if "api_key" in secrets: + api_key = secrets["api_key"] + + if api_key is None: + raise ValueError(f"API key is required for {destination} destination") + common_params["api_key"] = api_key + + host_name = getParam(spark_conf, "host_name") + account_id = getParam(spark_conf, "account_id") + + metrics_endpoint = getParam(spark_conf, "endpoints.metrics") + logs_endpoint = getParam(spark_conf, "endpoints.logs") + events_endpoint = getParam(spark_conf, "endpoints.events") + + if not all([metrics_endpoint, logs_endpoint, events_endpoint]): + if host_name is None: + raise ValueError( + "Either 'host_name' must be provided to auto-generate endpoints, " + "or all three endpoints (endpoints.metrics, endpoints.logs, endpoints.events) " + "must be explicitly configured." + ) + + if events_endpoint is None and account_id is None: + raise ValueError( + "For New Relic, 'account_id' is required to auto-generate events endpoint. " + "Either provide 'account_id' or explicitly configure 'endpoints.events'." + ) + + if metrics_endpoint is None: + metrics_endpoint = f"https://metric-api.{host_name}/metric/v1" + if logs_endpoint is None: + logs_endpoint = f"https://log-api.{host_name}/log/v1" + if events_endpoint is None: + events_endpoint = f"https://insights-collector.{host_name}/v1/accounts/{account_id}/events" + + common_params["endpoints"] = { + "metrics": metrics_endpoint, + "logs": logs_endpoint, + "events": events_endpoint, + } + + return common_params + + +def unix_to_iso(timestamp: int) -> str: + """Convert Unix timestamp in milliseconds/seconds to ISO format string.""" + ts = float(timestamp) + # If timestamp is unusually large, assume milliseconds + if ts > 1e12: + ts /= 1000 + dt = datetime.fromtimestamp(ts, tz=timezone.utc) + return dt.isoformat().replace("+00:00", "Z") + +def timestamp_in_unix_milliseconds(timestamp) -> int: + """Convert datetime to Unix timestamp in milliseconds.""" + if isinstance(timestamp, datetime): + return int(timestamp.timestamp() * 1000) + return int(timestamp) + +def get_status(status_display: str) -> str: + """Map pipeline status to appropriate status level.""" + status_lower = status_display.lower() + if status_lower in ['failed', 'error']: + return 'error' + elif status_lower in ['running', 'starting']: + return 'info' + elif status_lower in ['completed', 'success']: + return 'ok' + else: + return 'warn' + +def serialize_datetime(data): + if isinstance(data, dict): + return { + key: serialize_datetime(value) + for key, value in data.items() + } + elif isinstance(data, list): + return [serialize_datetime(item) for item in data] + elif isinstance(data, datetime): + return data.isoformat() + else: + return data + +def filter_null_fields(data): + if isinstance(data, dict): + return { + key: filter_null_fields(value) + for key, value in data.items() + if value is not None + } + elif isinstance(data, list): + return [filter_null_fields(item) for item in data if item is not None] + else: + return data + +def enforce_schema(data, schema, path = "root"): + # Nothing to enforce. + if schema is None or data is None: + return data + + # Handle oneOf before checking for type + if "oneOf" in schema: + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + "object": dict, + "array": list, + } + for sub_schema in schema["oneOf"]: + sub_type = sub_schema.get("type") + expected_python_type = type_map.get(sub_type) + if expected_python_type and isinstance(data, expected_python_type): + return enforce_schema(data, sub_schema, path) + raise ValueError(f"Value at {path} does not match any oneOf schema options") + + schema_type = schema.get("type") + if not schema_type: + raise ValueError(f"Failed to get type of the object at {path}.") + + # Handle array of types (e.g., ["string", "number", "boolean", "null"]) + if isinstance(schema_type, list): + # Check if data matches any of the allowed types + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + "null": type(None), + } + for allowed_type in schema_type: + expected_python_type = type_map.get(allowed_type) + if expected_python_type and isinstance(data, expected_python_type): + # Use the matched type for further validation + schema_type = allowed_type + break + else: + raise ValueError(f"Value at {path} does not match any allowed types: {schema_type}") + + # Validate dictionary + if isinstance(data, dict): + if schema_type != "object": + raise ValueError(f"Expected object at {path}, got {type(data).__name__}") + props = schema.get("properties", {}) + required_keys = schema.get("required", []) + additional_properties = schema.get("additionalProperties", False) + + # Validate defined properties + for k, v in props.items(): + if k in data: + data[k] = enforce_schema(data[k], v, f"{path}.{k}") + elif k in required_keys: + raise ValueError(f"Missing required field '{k}' at {path}") + + # Handle additional properties + for k, v in data.items(): + if k not in props: # This is an additional property + if additional_properties is False: + raise ValueError(f"Additional property '{k}' not allowed at {path}") + elif additional_properties is True: + # Allow any additional property, no validation + pass + elif isinstance(additional_properties, dict): + # Handle oneOf for additional properties + if "oneOf" in additional_properties: + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + } + + for sub_schema in additional_properties["oneOf"]: + expected_type = type_map.get(sub_schema.get("type")) + if expected_type and isinstance(v, expected_type): + data[k] = enforce_schema(v, sub_schema, f"{path}.{k}") + break + else: + raise ValueError( + f"Additional property '{k}' at {path} does not match any oneOf schema" + ) + else: + data[k] = enforce_schema(v, additional_properties, f"{path}.{k}") + + return data + + # Validate list + elif isinstance(data, list): + if schema_type != "array": + raise ValueError(f"Expected array at {path}, got {type(data).__name__}") + items_schema = schema.get("items", {}) + return [enforce_schema(item, items_schema, f"{path}[{i}]") for i, item in enumerate(data)] + + # Handle string + elif isinstance(data, str): + if schema_type != "string": + raise ValueError(f"Expected string at {path}, got {type(data).__name__}") + acceptable_values = schema.get("enum", []) + if acceptable_values and data not in acceptable_values: + raise ValueError(f"Invalid value at {path}: {data}. Allowed: {acceptable_values}") + max_length = schema.get("maxLength") + if max_length and len(data) > max_length: + return data[:max_length] + return data + + # Handle datetime + elif isinstance(data, datetime): + if schema_type == "string": + return data.isoformat() + elif schema_type == "integer": + return data.timestamp() + else: + raise ValueError(f"Cannot convert datetime to {schema_type}") + + # Handle integer + elif isinstance(data, int): + if schema_type == "integer" or schema_type == "number": + return data + elif schema_type == "string" and schema.get("format") == "date-time": + return unix_to_iso(data) + else: + raise ValueError(f"Cannot convert integer to {schema_type}") + + elif isinstance(data, float): + if schema_type != "number" and schema_type != "integer": + raise ValueError(f"Expected number at {path}, got {type(data).__name__}") + return data + elif isinstance(data, bool): + if schema_type != "boolean": + raise ValueError(f"Expected boolean at {path}, got {type(data).__name__}") + return data + return data + +def create_valid_json_or_fail_with_error(data, schema): + data = serialize_datetime(data) + data = filter_null_fields(data) + data = enforce_schema(data, schema) + return json.dumps(data) + +# ================================================================================ +# HTTP Layer +# ================================================================================ + +# Global session for connection pooling +session: Optional[requests.Session] = None + +class HTTPClient: + """ + HTTP client for batched POST requests using a persistent session. + + Input: Spark DataFrame with columns: + - endpoint (StringType): Target URL. + - header (StringType, JSON-encoded): HTTP headers. + - payload (binary data): Serialized request body. + """ + + def __init__(self, max_retry_duration_sec: int = 300, request_timeout_sec: int = 30): + """ + Initialize the HTTP client. + + Args: + max_retry_duration_sec: Maximum time in seconds to retry requests with exponential backoff + request_timeout_sec: Timeout in seconds for a single request + """ + self.max_retry_duration_sec = max_retry_duration_sec + self.request_timeout_sec = request_timeout_sec + + + def get_session(self) -> requests.Session: + """ + Get the global session instance. If not present, create a new one. + + Returns: + session: The global session instance + """ + global session + if session is None: + session = requests.Session() + return session + + def _make_request_with_retry(self, url: str, headers: Dict[str, str], payload: bytes): + """ + Make a POST request to the provided url. + + Args: + url: The endpoint URL + headers: Request headers + payload: Request payload + + Throws: + Exception: If the request fails and the retries are exhausted. + """ + # Compress payload + compressed_payload = gzip.compress(payload) + headers['Content-Encoding'] = 'gzip' + + response = None + try: + response = self.get_session().post( + url, + headers=headers, + data=compressed_payload, + timeout=self.request_timeout_sec + ) + response.raise_for_status() + print(f"Successfully sent request to URL: {url}, Payload: {payload.decode('utf-8')}, Response: {response.text}") + except Exception as e: + response_text = "No response" + if response is not None: + try: + response_text = str(response.json()) + except: + response_text = response.text if hasattr(response, 'text') else "Unable to read response" + print(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") + raise type(e)(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") from e + + def post(self, http_request_specs_df) -> None: + """ + Make POST requests for each row in the DataFrame. + Serially makes requests for each row in the DataFrame. + + Args: + http_request_specs_df: Spark DataFrame with columns 'endpoint', 'header', 'payloadBytes' + """ + + for row in http_request_specs_df.collect(): + try: + headers = json.loads(getattr(row, 'header', '{}')) + retry_wrapper = retry( + stop=stop_after_delay(self.max_retry_duration_sec), + wait=wait_exponential(multiplier=1, min=1, max=10), + reraise=True + ) + retry_wrapper(self._make_request_with_retry)(row.endpoint, headers, row.payloadBytes) + except Exception as e: + print(f"ERROR: {str(e)}") + continue # Continue with other requests regardless of success/failure + + +# ================================================================================ +# CONVERSION LAYER +# ================================================================================ + +class NewRelicMetricsConverter: + """Converter class to convert metrics to New Relic format.""" + + def create_metric( + self, + metric_name: str, + metric_value: float, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """Create a New Relic Gauge metric in the proper format. + + Args: + metric_name: Name of the metric (e.g., "pipeline.run.execution_time_seconds") + metric_value: Numeric value for the gauge metric + tags: Dictionary of tags (e.g., {"pipeline_id": "123", "phase": "execution"}) + timestamp: Unix timestamp for the metric + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the New Relic metric + + Raises: + ValueError if the fields are of unsupported types. + """ + attributes = tags.copy() + + # Add additional attributes if provided + if additional_attributes: + attributes.update(additional_attributes) + + # Base metric structure for New Relic Gauge + metric = { + "name": metric_name, + "type": "gauge", + "value": metric_value, + "timestamp": timestamp, + "attributes": attributes + } + # Enforce the schema + return create_valid_json_or_fail_with_error(metric, METRICS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for metrics.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("metrics", regexp_replace(col("metrics"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("metrics").alias("batch_metrics")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_metrics)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + + +class NewRelicEventsConverter: + """Converter class to convert events to New Relic format.""" + + def create_event( + self, + title: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create a New Relic event in the proper format. + + Args: + title: The event title + status: The status of the event (e.g., "success", "error", "warning") + tags: Dictionary of tags + timestamp: Unix timestamp for the event + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the New Relic event + + Raises: + ValueError if the fields are of unsupported types. + """ + event = { + "eventType": "DatabricksEvent", + "timestamp": timestamp, + "title": title, + "status": status, + "message": f"Event: {title}", + "source": SOURCE_NAME, + "service": SERVICE_NAME + } + event.update(tags) + + # Add additional attributes if provided + if additional_attributes: + event.update(additional_attributes) + # Enforce the schema + return create_valid_json_or_fail_with_error(event, EVENTS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for events.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("events", regexp_replace(col("events"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("events").alias("batch_events")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_events)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + +class NewRelicLogsConverter: + """Converter class to convert logs to New Relic format.""" + + def create_log( + self, + title: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create a New Relic log in the proper format. + + Args: + title: The log message/title + status: The status of the log (e.g., "error", "info", "warning") + tags: Dictionary of tags + timestamp: Unix timestamp for the log + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the New Relic log + + Raises: + ValueError if the fields are of unsupported types. + """ + # Base log structure for New Relic + log = { + "message": title, + "timestamp": timestamp, + "level": status.upper(), + "service": SERVICE_NAME, + "source": SOURCE_NAME + } + + # Add tag attributes + log.update(tags) + + # Add additional attributes if provided + if additional_attributes: + log.update(additional_attributes) + + # Enforce the schema + return create_valid_json_or_fail_with_error(log, LOGS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for logs.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("logs", regexp_replace(col("logs"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("logs").alias("batch_logs")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_logs)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + +# ================================================================================ +# INFERENCE LAYER +# ================================================================================ + +def convert_row_to_error_log(row): + """Convert a row to error log format.""" + params = { + "title": str(getattr(row, "message", "")), + "status": "error", + "tags": { + "pipeline_id": getattr(row, 'pipeline_id', ''), + "pipeline_run_id": getattr(row, 'pipeline_run_id', ''), + "table_name": getattr(row, 'table_name', ''), + "flow_name": getattr(row, 'flow_name', ''), + "level": "error" + }, + "timestamp": timestamp_in_unix_milliseconds(row.event_timestamp), + "additional_attributes": { + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "error": getattr(row, "error", None), + } + } + return _log_converter.create_log(**params) + +def convert_row_to_table_metrics(row): + """Convert a row to table metrics format.""" + # Base tags for all metrics + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "table_name": getattr(row, "table_name", ""), + "flow_name": getattr(row, "flow_name", ""), + "source": SOURCE_NAME + } + + # Timestamp for all metrics + timestamp = timestamp_in_unix_milliseconds(row.event_timestamp) + + return [ + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.upserted_rows", + metric_value=getattr(row, "num_upserted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.deleted_rows", + metric_value=getattr(row, "num_deleted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.output_rows", + metric_value=getattr(row, "num_output_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + ] + +def convert_row_to_pipeline_status_event(row): + """Convert a row to pipeline status event format.""" + # Determine pipeline status for title + status_display = row.latest_state.upper() if row.latest_state else 'UNKNOWN' + pipeline_id = getattr(row, "pipeline_id", "") + + params = { + "title": f"Pipeline {pipeline_id} {status_display}", + "status": get_status(status_display), + "tags": { + "pipeline_id": pipeline_id, + "latest_run_id": getattr(row, "pipeline_run_id", ""), + "status": status_display.lower(), + "source": SOURCE_NAME, + "service": SERVICE_NAME + }, + "timestamp": timestamp_in_unix_milliseconds(row.updated_at), + "additional_attributes": { + "pipeline_link": getattr(row, "pipeline_link", None), + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "is_complete": getattr(row, "is_complete", None), + "running_start_time": getattr(row, "running_start_time", None), + "end_time": getattr(row, "end_time", None), + "updated_at": getattr(row, "updated_at", None) , + "latest_error_log_message": getattr(row, "latest_error_log_message", None), + "latest_error_message": getattr(row, "latest_error_message", None), + } + } + return _events_converter.create_event(**params) + +def convert_row_to_pipeline_metrics(row): + """Convert a row to pipeline metrics format.""" + def has_attr(obj, attr): + return hasattr(obj, attr) and getattr(obj, attr) is not None + + if not has_attr(row, "queued_time") or not has_attr(row, "create_time"): + return [] + + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "source": SOURCE_NAME + } + metrics = [] + timestamp = timestamp_in_unix_milliseconds(getattr(row, "create_time", None)) + + end_time = getattr(row, "end_time", None) or datetime.now(timezone.utc) + + # Starting seconds: queued_time - create_time + starting_seconds = (row.queued_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.starting_seconds", + metric_value=starting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "starting"}, + timestamp=timestamp + )) + + # Seconds waiting for resources: initialization_start_time - queued_time + if not has_attr(row, "initialization_start_time"): + return metrics + waiting_seconds = (row.initialization_start_time - row.queued_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.waiting_for_resources_seconds", + metric_value=waiting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "waiting"}, + timestamp=timestamp + )) + + # Initialization seconds: running_start_time - initialization_start_time + if not has_attr(row, "running_start_time"): + return metrics + initialization_seconds = (row.running_start_time - row.initialization_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.initialization_seconds", + metric_value=initialization_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "initialization"}, + timestamp=timestamp + )) + + # Running seconds: end_time - running_start_time + running_seconds = (end_time - row.running_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.running_seconds", + metric_value=running_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "running"}, + timestamp=timestamp + )) + + # Total seconds: end_time - create_time + total_seconds = (end_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.total_seconds", + metric_value=total_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "total"}, + timestamp=timestamp + )) + + return metrics + +# ================================================================================ +# MAIN +# ================================================================================ + +# Source streams +event_logs_bronze = "event_logs_bronze" +pipeline_runs_status = "pipeline_runs_status" + + +http_client = None +def getClient(config): + """Global HTTP client getter.""" + global http_client + if http_client is None: + http_client = HTTPClient( + max_retry_duration_sec=config["max_retry_duration_sec"], + request_timeout_sec=config["request_timeout_sec"] + ) + return http_client + +def register_sink_for_pipeline_events(): + @dlt.foreach_batch_sink(name="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_pipeline_status_event, StringType()) + events_df = batch_df.withColumn("events", destination_format_udf(struct("*"))).select("events").filter(col("events").isNotNull()) + http_request_spec = _events_converter.create_http_requests_spec( + events_df, + _global_config["num_rows_per_batch"], + get_newrelic_headers(_global_config["api_key"]), + _global_config["endpoints"]["events"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + + +def register_sink_for_errors(): + @dlt.foreach_batch_sink(name="send_errors_to_3p_monitoring") + def send_errors_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_error_log, StringType()) + logs_df = batch_df.withColumn("logs", destination_format_udf(struct("*"))).select("logs").filter(col("logs").isNotNull()) + http_request_spec = _log_converter.create_http_requests_spec( + logs_df, + _global_config["num_rows_per_batch"], + get_newrelic_headers(_global_config["api_key"]), + _global_config["endpoints"]["logs"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_errors_to_3p_monitoring") + def send_errors_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze).filter("error IS NOT NULL OR level = 'ERROR'") + +def register_sink_for_pipeline_metrics(): + @dlt.foreach_batch_sink(name="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_pipeline_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()) + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_newrelic_headers(_global_config["api_key"]), + _global_config["endpoints"]["metrics"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + +def register_sink_for_table_metrics(): + @dlt.foreach_batch_sink(name="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_table_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()) + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_newrelic_headers(_global_config["api_key"]), + _global_config["endpoints"]["metrics"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze) \ + .filter("table_name is not null AND details:flow_progress.metrics is not null AND event_type = 'flow_progress'") \ + .selectExpr( + "pipeline_id", + "pipeline_run_id", + "table_name", + "flow_name", + "event_timestamp", + "details:flow_progress.metrics.num_upserted_rows::bigint as num_upserted_rows", + "details:flow_progress.metrics.num_deleted_rows::bigint as num_deleted_rows", + "(details:flow_progress.metrics.num_upserted_rows::bigint + details:flow_progress.metrics.num_deleted_rows::bigint) as num_output_rows" + ) \ + .filter("num_upserted_rows is not null OR num_deleted_rows is not null OR num_output_rows is not null") + +# ================================================================================ +# MAIN INITIALIZATION +# ================================================================================ + +# Initialize global configuration and register sinks. +if getParam(spark.conf, "destination") == "newrelic": + initialize_global_config(spark.conf) + register_sink_for_errors() + register_sink_for_pipeline_events() + register_sink_for_table_metrics() + register_sink_for_pipeline_metrics() \ No newline at end of file diff --git a/contrib/dbx_ingestion_monitoring/third_party_sinks/splunk_observability_sink.py b/contrib/dbx_ingestion_monitoring/third_party_sinks/splunk_observability_sink.py new file mode 100644 index 0000000..8f7a943 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/third_party_sinks/splunk_observability_sink.py @@ -0,0 +1,924 @@ +""" +Splunk Observability Cloud Sink for Monitoring ETL Pipeline. + +For configuration details, refer to README-third-party-monitoring.md +""" + +import json +import requests +import gzip +from typing import List, Dict, Any, Optional +from tenacity import retry, stop_after_delay, wait_exponential +from datetime import datetime, timezone +from pyspark.sql import SparkSession +from pyspark.sql.types import StringType, ArrayType +from pyspark.sql.functions import lit, col, collect_list, concat, expr, udf, struct, explode, regexp_replace +import dlt + +# Global Configuration. +SERVICE_NAME = "databricks-lakeflow-connect" +SOURCE_NAME = "databricks" +_global_config = None +_log_converter = None +_events_converter = None +_metrics_converter = None + +# Global Schemas +# Schema Validation enforces field types, trims oversized strings (maxLength), +# converts datetime objects to appropriate formats, validates required fields, +# and handles oneOf constraints for additionalProperties with type safety. + +METRICS_SCHEMA = { + "type": "object", + "required": ["metric", "value", "timestamp"], + "properties": { + "metric": { + "type": "string", + "description": "The name of the metric." + }, + "value": { + "type": "number", + "description": "The numeric value for the metric." + }, + "timestamp": { + "type": "integer", + "description": "Unix timestamp in milliseconds." + }, + "dimensions": { + "type": "object", + "description": "Key-value pairs for metric dimensions/tags.", + "additionalProperties": { + "type": "string" + } + } + } +} + +EVENTS_SCHEMA = { + "type": "object", + "required": ["eventType", "category", "timestamp"], + "properties": { + "eventType": { + "type": "string", + "description": "The type of event." + }, + "category": { + "type": "string", + "enum": ["USER_DEFINED", "ALERT", "AUDIT", "JOB"], + "description": "The category of the event." + }, + "timestamp": { + "type": "integer", + "description": "Unix timestamp in milliseconds." + }, + "dimensions": { + "type": "object", + "description": "Key-value pairs for event dimensions.", + "additionalProperties": { + "type": "string" + } + }, + "properties": { + "type": "object", + "description": "Additional event properties.", + "additionalProperties": True + } + } +} + +# ================================================================================ +# UTILITIES +# ================================================================================ + +def get_signalfx_headers(access_token: str): + """Get headers for the SignalFx/Splunk Observability API.""" + return {"Content-Type": "application/json", "X-SF-TOKEN": access_token} + +def initialize_global_config(spark_conf): + """Initialize global configuration from Spark configuration.""" + global _global_config, _log_converter, _events_converter, _metrics_converter + + _global_config = getThirdPartySinkConfigFromSparkConfig(spark_conf) + _log_converter = SplunkLogsConverter() + _events_converter = SplunkEventsConverter() + _metrics_converter = SplunkMetricsConverter() + +def getParam(spark_conf, key: str, default=None): + value = spark_conf.get(key, default) + if value == "" or value is None: + return None + return value + +def getThirdPartySinkConfigFromSparkConfig(spark_conf): + """ + Extract and merge configuration from Spark configuration and secret scope. + + This function extracts configuration variables from Spark configuration and merges + them with key-value pairs from a secret scope (if provided) to build common_params. + Secret store values take precedence over Spark configuration values when both exist. + + Args: + spark_conf: Spark configuration object containing required parameters + + Returns: + dict: Merged configuration parameters with secrets taking precedence + + The function looks for a 'secrets_scope' parameter in Spark config. If found, + it will retrieve all secrets from that scope and merge them with the base + configuration, giving preference to secret values. + """ + destination = getParam(spark_conf, "destination") + if destination is None: + raise ValueError("Destination must be provided for third party sinks.") + + common_params = { + "destination": destination, + "num_rows_per_batch": int(spark_conf.get("num_rows_per_batch", "100")), + "max_retry_duration_sec": int(spark_conf.get("max_retry_duration_sec", "300")), + "request_timeout_sec": int(spark_conf.get("request_timeout_sec", "30")), + "splunk_access_token": getParam(spark_conf, "splunk_access_token"), + "host_name": getParam(spark_conf, "host_name") + } + + # Merge secrets from a scope if scope is provided. + scope = getParam(spark_conf, "secrets_scope") + if scope is not None: + secrets = { + s.key: dbutils.secrets.get(scope=scope, key=s.key) + for s in dbutils.secrets.list(scope) + } + common_params.update(secrets) + + # Validate required credentials + if common_params['splunk_access_token'] is None: + raise ValueError(f"Splunk access token is required for {destination} destination") + + # Get endpoints (allow override) + metrics_endpoint = getParam(spark_conf, "endpoints.metrics") + logs_endpoint = getParam(spark_conf, "endpoints.logs") + events_endpoint = getParam(spark_conf, "endpoints.events") + + # Auto-generate endpoints if not provided + if not all([metrics_endpoint, logs_endpoint, events_endpoint]): + if common_params['host_name'] is None: + raise ValueError( + "Either 'host_name' must be provided to auto-generate SignalFx endpoints, " + "or all three endpoints (endpoints.metrics, endpoints.logs, endpoints.events) " + "must be explicitly configured." + ) + + # Auto-generate endpoints based on host_name + # Logs are sent as events to the same endpoint as events + if metrics_endpoint is None: + metrics_endpoint = f"https://{common_params['host_name']}/v2/datapoint" + if logs_endpoint is None: + logs_endpoint = f"https://{common_params['host_name']}/v2/event" + if events_endpoint is None: + events_endpoint = f"https://{common_params['host_name']}/v2/event" + + # Validate all endpoints are now set + if metrics_endpoint is None: + raise ValueError(f"Metrics endpoint is required for {destination} destination") + if logs_endpoint is None: + raise ValueError(f"Logs endpoint is required for {destination} destination") + if events_endpoint is None: + raise ValueError(f"Events endpoint is required for {destination} destination") + + common_params["endpoints"] = { + "metrics": metrics_endpoint, + "logs": logs_endpoint, + "events": events_endpoint, + } + + return common_params + + +def unix_to_iso(timestamp: int) -> str: + """Convert Unix timestamp in milliseconds/seconds to ISO format string.""" + ts = float(timestamp) + # If timestamp is unusually large, assume milliseconds + if ts > 1e12: + ts /= 1000 + dt = datetime.fromtimestamp(ts, tz=timezone.utc) + return dt.isoformat().replace("+00:00", "Z") + +def timestamp_in_unix_milliseconds(timestamp) -> int: + """Convert datetime to Unix timestamp in milliseconds.""" + if isinstance(timestamp, datetime): + return int(timestamp.timestamp() * 1000) + return int(timestamp) + +def timestamp_in_unix_seconds(timestamp) -> float: + """Convert datetime to Unix timestamp in seconds.""" + if isinstance(timestamp, datetime): + return timestamp.timestamp() + return float(timestamp) / 1000.0 + +def get_status(status_display: str) -> str: + """Map pipeline status to appropriate status level.""" + status_lower = status_display.lower() + if status_lower in ['failed', 'error']: + return 'error' + elif status_lower in ['running', 'starting']: + return 'info' + elif status_lower in ['completed', 'success']: + return 'ok' + else: + return 'warn' + +def serialize_datetime(data): + if isinstance(data, dict): + return { + key: serialize_datetime(value) + for key, value in data.items() + } + elif isinstance(data, list): + return [serialize_datetime(item) for item in data] + elif isinstance(data, datetime): + return data.isoformat() + else: + return data + +def filter_null_fields(data): + if isinstance(data, dict): + return { + key: filter_null_fields(value) + for key, value in data.items() + if value is not None + } + elif isinstance(data, list): + return [filter_null_fields(item) for item in data if item is not None] + else: + return data + +def enforce_schema(data, schema, path = "root"): + # Nothing to enforce. + if schema is None or data is None: + return data + + + schema_type = schema.get("type") + if not schema_type: + raise ValueError(f"Failed to get type of the object at {path}.") + + # Validate dictionary + if isinstance(data, dict): + if schema_type != "object": + raise ValueError(f"Expected object at {path}, got {type(data).__name__}") + props = schema.get("properties", {}) + required_keys = schema.get("required", []) + additional_properties = schema.get("additionalProperties", False) + + # Validate defined properties + for k, v in props.items(): + if k in data: + data[k] = enforce_schema(data[k], v, f"{path}.{k}") + elif k in required_keys: + raise ValueError(f"Missing required field '{k}' at {path}") + + # Handle additional properties + for k, v in data.items(): + if k not in props: # This is an additional property + if additional_properties is False: + raise ValueError(f"Additional property '{k}' not allowed at {path}") + elif additional_properties is True: + # Allow any additional property, no validation + pass + elif isinstance(additional_properties, dict): + # Handle oneOf for additional properties + if "oneOf" in additional_properties: + type_map = { + "string": str, + "number": (int, float), + "integer": int, + "boolean": bool, + } + + for sub_schema in additional_properties["oneOf"]: + expected_type = type_map.get(sub_schema.get("type")) + if expected_type and isinstance(v, expected_type): + data[k] = enforce_schema(v, sub_schema, f"{path}.{k}") + break + else: + raise ValueError( + f"Additional property '{k}' at {path} does not match any oneOf schema" + ) + else: + data[k] = enforce_schema(v, additional_properties, f"{path}.{k}") + + return data + + # Validate list + elif isinstance(data, list): + if schema_type != "array": + raise ValueError(f"Expected array at {path}, got {type(data).__name__}") + items_schema = schema.get("items", {}) + return [enforce_schema(item, items_schema, f"{path}[{i}]") for i, item in enumerate(data)] + + # Handle string + elif isinstance(data, str): + if schema_type != "string": + raise ValueError(f"Expected string at {path}, got {type(data).__name__}") + acceptable_values = schema.get("enum", []) + if acceptable_values and data not in acceptable_values: + raise ValueError(f"Invalid value at {path}: {data}. Allowed: {acceptable_values}") + max_length = schema.get("maxLength") + if max_length and len(data) > max_length: + return data[:max_length] + return data + + # Handle datetime + elif isinstance(data, datetime): + if schema_type == "string": + return data.isoformat() + elif schema_type == "integer": + return data.timestamp() + else: + raise ValueError(f"Cannot convert datetime to {schema_type}") + + # Handle integer + elif isinstance(data, int): + if schema_type == "integer": + return data + if schema_type == "number": + return float(data) + elif schema_type == "string" and schema.get("format") == "date-time": + return unix_to_iso(data) + else: + raise ValueError(f"Cannot convert integer to {schema_type}") + + elif isinstance(data, float): + if schema_type != "number": + raise ValueError(f"Expected number at {path}, got {type(data).__name__}") + return data + elif isinstance(data, bool): + if schema_type != "boolean": + raise ValueError(f"Expected boolean at {path}, got {type(data).__name__}") + return data + return data + +def create_valid_json_or_fail_with_error(data, schema): + data = serialize_datetime(data) + data = filter_null_fields(data) + data = enforce_schema(data, schema) + return json.dumps(data) + +# ================================================================================ +# HTTP Layer +# ================================================================================ + +# Global session for connection pooling +session: Optional[requests.Session] = None + +class HTTPClient: + """ + HTTP client for batched POST requests using a persistent session. + + Input: Spark DataFrame with columns: + - endpoint (StringType): Target URL. + - header (StringType, JSON-encoded): HTTP headers. + - payload (binary data): Serialized request body. + """ + + def __init__(self, max_retry_duration_sec: int = 300, request_timeout_sec: int = 30): + """ + Initialize the HTTP client. + + Args: + max_retry_duration_sec: Maximum time in seconds to retry requests with exponential backoff + request_timeout_sec: Timeout in seconds for a single request + """ + self.max_retry_duration_sec = max_retry_duration_sec + self.request_timeout_sec = request_timeout_sec + + + def get_session(self) -> requests.Session: + """ + Get the global session instance. If not present, create a new one. + + Returns: + session: The global session instance + """ + global session + if session is None: + session = requests.Session() + return session + + def _make_request_with_retry(self, url: str, headers: Dict[str, str], payload: bytes): + """ + Make a POST request to the provided url. + + Args: + url: The endpoint URL + headers: Request headers + payload: Request payload + + Throws: + Exception: If the request fails and the retries are exhausted. + """ + # Compress payload + compressed_payload = gzip.compress(payload) + headers['Content-Encoding'] = 'gzip' + + response = None + try: + response = self.get_session().post( + url, + headers=headers, + data=compressed_payload, + timeout=self.request_timeout_sec + ) + response.raise_for_status() + print(f"Successfully sent request to URL: {url}, Payload: {payload.decode('utf-8')}, Response: {response.text}") + except Exception as e: + response_text = "No response" + if response is not None: + try: + response_text = str(response.json()) + except: + response_text = response.text if hasattr(response, 'text') else "Unable to read response" + print(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") + raise type(e)(f"Request failed for URL: {url}, headers: {str(headers)}, Payload: {payload.decode('utf-8')}, Error: {str(e)}, Response: {response_text}") from e + + def post(self, http_request_specs_df) -> None: + """ + Make POST requests for each row in the DataFrame. + Serially makes requests for each row in the DataFrame. + + Args: + http_request_specs_df: Spark DataFrame with columns 'endpoint', 'header', 'payloadBytes' + """ + + for row in http_request_specs_df.collect(): + try: + headers = json.loads(getattr(row, 'header', '{}')) + retry_wrapper = retry( + stop=stop_after_delay(self.max_retry_duration_sec), + wait=wait_exponential(multiplier=1, min=1, max=10), + reraise=True + ) + retry_wrapper(self._make_request_with_retry)(row.endpoint, headers, row.payloadBytes) + except Exception as e: + print(f"ERROR: {str(e)}") + continue # Continue with other requests regardless of success/failure + + +# ================================================================================ +# CONVERSION LAYER +# ================================================================================ + +class SplunkMetricsConverter: + """Converter class to convert metrics to Splunk Observability format.""" + + def create_metric( + self, + metric_name: str, + metric_value: float, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """Create a Splunk Observability metric in the proper format. + + Args: + metric_name: Name of the metric (e.g., "pipeline.run.execution_time_seconds") + metric_value: Numeric value for the gauge metric + tags: Dictionary of tags (e.g., {"pipeline_id": "123", "phase": "execution"}) + timestamp: Unix timestamp in milliseconds for the metric + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Splunk metric + + Raises: + ValueError if the fields are of unsupported types. + """ + # Merge tags and additional attributes into dimensions + dimensions = tags.copy() + if additional_attributes: + dimensions.update(additional_attributes) + + # Base metric structure for Splunk Observability (gauge type) + metric = { + "metric": metric_name, + "value": metric_value, + "timestamp": timestamp, + "dimensions": dimensions + } + + # Enforce the schema + return create_valid_json_or_fail_with_error(metric, METRICS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for metrics.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("metrics", regexp_replace(col("metrics"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("metrics").alias("batch_metrics")) \ + .withColumn("payload", concat(lit('{"gauge": ['), + expr("concat_ws(',', batch_metrics)"), + lit(']}'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + + +class SplunkEventsConverter: + """Converter class to convert events to Splunk Observability format.""" + + + def create_event( + self, + title: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create a Splunk Observability event in the proper format. + + Args: + title: The event title + status: The status of the event (e.g., "ok", "warn", "error") + tags: Dictionary of tags + timestamp: Unix timestamp in milliseconds for the event + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Splunk event + + Raises: + ValueError if the fields are of unsupported types. + """ + event = { + "eventType": title, + "category": "USER_DEFINED", + "timestamp": timestamp, + "dimensions": tags, + "properties": { + "status": status, + "source": SOURCE_NAME, + "service": SERVICE_NAME + } + } + + # Add additional attributes if provided + if additional_attributes: + event["properties"].update(additional_attributes) + + # Enforce the schema + return create_valid_json_or_fail_with_error(event, EVENTS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for events.""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("events", regexp_replace(col("events"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("events").alias("batch_events")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_events)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + +class SplunkLogsConverter: + """Converter class to convert logs to Splunk Observability events format.""" + + def create_log( + self, + title: str, + status: str, + tags: Dict[str, str], + timestamp: int, + additional_attributes: Optional[Dict[str, Any]] = None) -> str: + """ + Create a Splunk Observability event for log data. + + Logs are sent as events to eliminate the need for HEC and logs collector. + + Args: + title: The log message/title + status: The status of the log (e.g., "error", "info", "warning") + tags: Dictionary of tags + timestamp: Unix timestamp in milliseconds for the log + additional_attributes: Optional additional attributes to include + + Returns: + JSON string representing the Splunk Observability event + + Raises: + ValueError if the fields are of unsupported types. + """ + event = { + "eventType": title, + "category": "USER_DEFINED", + "timestamp": timestamp, + "dimensions": tags, + "properties": { + "status": status, + "source": SOURCE_NAME, + "service": SERVICE_NAME + } + } + + # Add additional attributes if provided + if additional_attributes: + event["properties"].update(additional_attributes) + + # Enforce the schema + return create_valid_json_or_fail_with_error(event, EVENTS_SCHEMA) + + def create_http_requests_spec(self, df, num_rows_per_batch: int, headers: dict, endpoint: str): + """Create HTTP request spec DataFrame for logs (sent as events).""" + df_with_batch_id = df.withColumn("batch_id", + expr(f"int((row_number() over (order by 1) - 1) / {num_rows_per_batch})")) \ + .withColumn("logs", regexp_replace(col("logs"), "\n", "")) + return df_with_batch_id.groupBy("batch_id") \ + .agg(collect_list("logs").alias("batch_logs")) \ + .withColumn("payload", concat(lit('['), + expr("concat_ws(',', batch_logs)"), + lit(']'))) \ + .withColumn("payloadBytes", col("payload").cast("binary")) \ + .withColumn("endpoint", lit(endpoint)) \ + .withColumn("header", lit(json.dumps(headers))) \ + .select("endpoint", "header", "payloadBytes") + + +# ================================================================================ +# INFERENCE LAYER +# ================================================================================ + +def convert_row_to_error_log(row): + """Convert a row to error log format.""" + params = { + "title": str(getattr(row, "message", "")), + "status": "error", + "tags": { + "pipeline_id": getattr(row, 'pipeline_id', ''), + "pipeline_run_id": getattr(row, 'pipeline_run_id', ''), + "table_name": getattr(row, 'table_name', ''), + "flow_name": getattr(row, 'flow_name', ''), + "level": "error" + }, + "timestamp": timestamp_in_unix_milliseconds(row.event_timestamp), + "additional_attributes": { + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "error": getattr(row, "error", None), + } + } + return _log_converter.create_log(**params) + +def convert_row_to_table_metrics(row): + """Convert a row to table metrics format.""" + # Base tags for all metrics + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "table_name": getattr(row, "table_name", ""), + "flow_name": getattr(row, "flow_name", ""), + "source": SOURCE_NAME + } + + # Timestamp for all metrics + timestamp = timestamp_in_unix_milliseconds(row.event_timestamp) + + return [ + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.upserted_rows", + metric_value=getattr(row, "num_upserted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.deleted_rows", + metric_value=getattr(row, "num_deleted_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + _metrics_converter.create_metric( + metric_name="dlt.table.throughput.output_rows", + metric_value=getattr(row, "num_output_rows", 0) or 0, + tags={**base_tags, "metric_type": "count"}, + timestamp=timestamp, + additional_attributes={} + ), + ] + +def convert_row_to_pipeline_status_event(row): + """Convert a row to pipeline status event format.""" + # Determine pipeline status for title + status_display = row.latest_state.upper() if row.latest_state else 'UNKNOWN' + pipeline_id = getattr(row, "pipeline_id", "") + + params = { + "title": f"Pipeline {pipeline_id} {status_display}", + "status": get_status(status_display), + "tags": { + "pipeline_id": pipeline_id, + "latest_run_id": getattr(row, "pipeline_run_id", ""), + "status": status_display.lower(), + "source": SOURCE_NAME, + "service": SERVICE_NAME + }, + "timestamp": timestamp_in_unix_milliseconds(row.updated_at), + "additional_attributes": { + "pipeline_link": getattr(row, "pipeline_link", None), + "pipeline_run_link": getattr(row, "pipeline_run_link", None), + "is_complete": getattr(row, "is_complete", None), + "running_start_time": getattr(row, "running_start_time", None), + "end_time": getattr(row, "end_time", None), + "updated_at": getattr(row, "updated_at", None) , + "latest_error_log_message": getattr(row, "latest_error_log_message", None), + "latest_error_message": getattr(row, "latest_error_message", None), + } + } + return _events_converter.create_event(**params) + +def convert_row_to_pipeline_metrics(row): + """Convert a row to pipeline metrics format.""" + def has_attr(obj, attr): + return hasattr(obj, attr) and getattr(obj, attr) is not None + + if not has_attr(row, "queued_time") or not has_attr(row, "create_time"): + return [] + + base_tags = { + "pipeline_id": getattr(row, "pipeline_id", ""), + "pipeline_run_id": getattr(row, "pipeline_run_id", ""), + "source": SOURCE_NAME + } + metrics = [] + timestamp = timestamp_in_unix_milliseconds(getattr(row, "create_time", None)) + + end_time = getattr(row, "end_time", None) or datetime.now(timezone.utc) + + # Starting seconds: queued_time - create_time + starting_seconds = (row.queued_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.starting_seconds", + metric_value=starting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "starting"}, + timestamp=timestamp + )) + + # Seconds waiting for resources: initialization_start_time - queued_time + if not has_attr(row, "initialization_start_time"): + return metrics + waiting_seconds = (row.initialization_start_time - row.queued_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.waiting_for_resources_seconds", + metric_value=waiting_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "waiting"}, + timestamp=timestamp + )) + + # Initialization seconds: running_start_time - initialization_start_time + if not has_attr(row, "running_start_time"): + return metrics + initialization_seconds = (row.running_start_time - row.initialization_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.initialization_seconds", + metric_value=initialization_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "initialization"}, + timestamp=timestamp + )) + + # Running seconds: end_time - running_start_time + running_seconds = (end_time - row.running_start_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.running_seconds", + metric_value=running_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "running"}, + timestamp=timestamp + )) + + # Total seconds: end_time - create_time + total_seconds = (end_time - row.create_time).total_seconds() + metrics.append(_metrics_converter.create_metric( + metric_name="pipeline.run.total_seconds", + metric_value=total_seconds, + tags={**base_tags, "metric_type": "duration", "phase": "total"}, + timestamp=timestamp + )) + + return metrics + +# ================================================================================ +# MAIN +# ================================================================================ + +# Source streams +event_logs_bronze = "event_logs_bronze" +pipeline_runs_status = "pipeline_runs_status" + + +http_client = None +def getClient(config): + """Global HTTP client getter.""" + global http_client + if http_client is None: + http_client = HTTPClient( + max_retry_duration_sec=config["max_retry_duration_sec"], + request_timeout_sec=config["request_timeout_sec"] + ) + return http_client + +def register_sink_for_pipeline_events(): + @dlt.foreach_batch_sink(name="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_pipeline_status_event, StringType()) + events_df = batch_df.withColumn("events", destination_format_udf(struct("*"))).select("events").filter(col("events").isNotNull()) + http_request_spec = _events_converter.create_http_requests_spec( + events_df, + _global_config["num_rows_per_batch"], + get_signalfx_headers(_global_config["splunk_access_token"]), + _global_config["endpoints"]["events"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_pipeline_status_to_3p_monitoring") + def send_pipeline_status_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + + +def register_sink_for_errors(): + @dlt.foreach_batch_sink(name="send_errors_to_3p_monitoring") + def send_errors_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_error_log, StringType()) + logs_df = batch_df.withColumn("logs", destination_format_udf(struct("*"))).select("logs").filter(col("logs").isNotNull()) + http_request_spec = _log_converter.create_http_requests_spec( + logs_df, + _global_config["num_rows_per_batch"], + get_signalfx_headers(_global_config["splunk_access_token"]), + _global_config["endpoints"]["logs"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_errors_to_3p_monitoring") + def send_errors_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze).filter("error IS NOT NULL OR level = 'ERROR'") + +def register_sink_for_pipeline_metrics(): + @dlt.foreach_batch_sink(name="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_3p_monitoring(batch_df, batch_id): + # DataFrame conversion logic + destination_format_udf = udf(convert_row_to_pipeline_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()) + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_signalfx_headers(_global_config["splunk_access_token"]), + _global_config["endpoints"]["metrics"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_pipeline_metrics_to_3p_monitoring") + def send_pipeline_metrics_to_sink(): + return spark.readStream.table(f"{pipeline_runs_status}_cdf") + +def register_sink_for_table_metrics(): + @dlt.foreach_batch_sink(name="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_3p_monitoring(batch_df, batch_id): + destination_format_udf = udf(convert_row_to_table_metrics, ArrayType(StringType())) + metrics_df = batch_df.withColumn("metrics_array", destination_format_udf(struct("*"))).select(explode("metrics_array").alias("metrics")).filter(col("metrics").isNotNull()) + http_request_spec = _metrics_converter.create_http_requests_spec( + metrics_df, + _global_config["num_rows_per_batch"], + get_signalfx_headers(_global_config["splunk_access_token"]), + _global_config["endpoints"]["metrics"] + ) + getClient(_global_config).post(http_request_spec) + + @dlt.append_flow(target="send_table_metrics_to_3p_monitoring") + def send_table_metrics_to_sink(): + return spark.readStream.option("skipChangeCommits", "true").table(event_logs_bronze) \ + .filter("table_name is not null AND details:flow_progress.metrics is not null AND event_type = 'flow_progress'") \ + .selectExpr( + "pipeline_id", + "pipeline_run_id", + "table_name", + "flow_name", + "event_timestamp", + "details:flow_progress.metrics.num_upserted_rows::bigint as num_upserted_rows", + "details:flow_progress.metrics.num_deleted_rows::bigint as num_deleted_rows", + "(details:flow_progress.metrics.num_upserted_rows::bigint + details:flow_progress.metrics.num_deleted_rows::bigint) as num_output_rows" + ) \ + .filter("num_upserted_rows is not null OR num_deleted_rows is not null OR num_output_rows is not null") + +# ================================================================================ +# MAIN INITIALIZATION +# ================================================================================ + +# Initialize global configuration and register sinks. +if getParam(spark.conf, "destination") == "splunk_observability": + initialize_global_config(spark.conf) + register_sink_for_errors() + register_sink_for_pipeline_events() + register_sink_for_table_metrics() + register_sink_for_pipeline_metrics() diff --git a/contrib/dbx_ingestion_monitoring/vars/common.vars.yml b/contrib/dbx_ingestion_monitoring/vars/common.vars.yml new file mode 100644 index 0000000..4fe86ec --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/vars/common.vars.yml @@ -0,0 +1,19 @@ + # Shared configuration variables + +variables: + monitoring_catalog: + description: The catalog where the monitoring tables are to be created + default: main + monitoring_schema: + description: The schema where the monitoring tables are to be created + default: dbx_ingestion_monitoring + warehouse_id: + description: > + (Optional) The ID of the warehouse to use for dashboards, alerts, etc. If not specified, + an attempt will be made to find a suitable one. + default: "" + dab_type: + description: > + A string that gets incorporated in names of shared resources to distinguish between + usage in different DABs. + default: SDP diff --git a/contrib/dbx_ingestion_monitoring/vars/import_event_logs.vars.yml b/contrib/dbx_ingestion_monitoring/vars/import_event_logs.vars.yml new file mode 100644 index 0000000..65048e9 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/vars/import_event_logs.vars.yml @@ -0,0 +1,35 @@ +# Configuration variables for the shared jobs for importing event logs +variables: + imported_event_logs_table_name: + description: The name of the default table to use for importing event logs by the `import_event_logs` job. + default: imported_event_logs + imported_pipeline_ids: + description: > + A comma-separated list of CDC ingestion pipeline ids whose logs are to be imported. This should be used only for pipelines which are not + configured to store their event logs into a Delta table. For those that are configured, use the `monitored_pipeline_ids` variable. + default: "" + imported_pipeline_tags: + description: > + A semicolon-separated list of comma-separated tag[:value] pairs to filter pipelines for importing event logs. + Format: "tag1[:value1],tag2[:value2];tag3[:value3]" + - Semicolons (;) separate tag groups (OR logic between groups) + - Commas (,) separate tags within a group (ALL must match - AND logic) + - 'tag' is shorthand for 'tag:' (tag with empty value) + Example: "tier:T0;team:data,tier:T1" means (tier:T0) OR (team:data AND tier:T1) + This is an alternative to specifying pipeline IDs explicitly via `imported_pipeline_ids`. + If both are specified, pipelines matching either criteria will be included. + default: "" + import_event_log_schedule_state: + description: Enable (`UNPAUSED`) or disable (`PAUSED`) the periodic import of SDP event logs. + default: PAUSED + import_event_log_cron_schedule: + description: > + The cron schedule (see http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) to use for importing SDP event + logs. Note that you also have to set `import_event_log_schedule_state` to `UNPAUSED` for this to take effect. The default is to run the + import hourly. + default: "0 15 0/1 * * ?" + notification_emails: + description: A list of e-mails to notify in case of failures + type: complex + default: + - ${workspace.current_user.userName} diff --git a/contrib/dbx_ingestion_monitoring/vars/pipeline_tags_index.vars.yml b/contrib/dbx_ingestion_monitoring/vars/pipeline_tags_index.vars.yml new file mode 100644 index 0000000..c3c5cd5 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/vars/pipeline_tags_index.vars.yml @@ -0,0 +1,35 @@ +# Configuration variables for the helper job that builds the pipeline tags index (see ../resources/build_pipeline_tags_index.job.yml) +variables: + pipeline_tags_index_enabled: + description: > + Whether to use the pipeline tags index table for tag-based pipeline discovery. If disabled, the system will fall back to + API-based discovery (which is slower but always up-to-date). + default: false + + pipeline_tags_index_schedule_state: + description: Enable (`UNPAUSED`) or disable (`PAUSED`) the periodic rebuild of the pipeline tags index. + default: PAUSED + + pipeline_tags_index_cron_schedule: + description: > + The cron schedule (see http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) to use for rebuilding + the pipeline tags index. The default is to run once daily at 9:31 AM UTC. + default: "0 31 9 * * ?" + + pipeline_tags_index_max_age_hours: + description: > + Maximum age (in hours) of the pipeline tags index before it is considered stale. If the index is older than this, + the system will fall back to API-based discovery and log a warning. + default: 48 + + pipeline_tags_index_table_name: + description: > + The name of the table to use for the pipeline tags inverted index. This table maps tag:value pairs to lists of pipeline IDs. + The table is used to efficiently discover pipelines by tags without having to call the Databricks API for every pipeline. + default: pipeline_tags_index + + pipeline_tags_index_api_fallback_enabled: + description: > + Whether to fall back to API-based pipeline discovery if the index table is unavailable or stale. + If false, an error will be raised when the index is unavailable. + default: true diff --git a/contrib/dbx_ingestion_monitoring/vars/post_deploy.vars.yml b/contrib/dbx_ingestion_monitoring/vars/post_deploy.vars.yml new file mode 100644 index 0000000..972ae67 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/vars/post_deploy.vars.yml @@ -0,0 +1,15 @@ +# Configuration variables for the `post_deploy` job + +variables: + main_dashboard_name: + description: > + (Optional) The name of the monitoring dashboard. Override this when using multiple deployment targets to avoid name conflicts. + If not set, the name of the dashboard template will be used. + default: Example Spark Declarative Pipelines Dashboard + main_dashboard_id: + description: > + (Optional) The dashboard id to publish the main dashboard to. If not specified, the name of the dashboards will be used to + find an existing dashboard with that name. If none is found, a new one will be created. + default: "" + main_dashboard_template_path: + description: (Required) The path to the main dashboard template. diff --git a/contrib/dbx_ingestion_monitoring/vars/third_party_sink.vars.yml b/contrib/dbx_ingestion_monitoring/vars/third_party_sink.vars.yml new file mode 100644 index 0000000..52768a8 --- /dev/null +++ b/contrib/dbx_ingestion_monitoring/vars/third_party_sink.vars.yml @@ -0,0 +1,86 @@ +variables: + third_party_destination: + description: > + Third-party monitoring destination (datadog, newrelic, azuremonitor, splunk_observability). Leave empty to disable third-party monitoring. + default: "" + third_party_host_name: + description: > + Host name for the third-party monitoring service. Used to auto-generate API endpoints. + For Datadog: datadoghq.com, datadoghq.eu, us5.datadoghq.com, ap2.datadoghq.com + For New Relic: newrelic.com, eu.newrelic.com + For Azure Monitor: ..ingest.monitor.azure.com + For Splunk Observability: ingest..signalfx.com (e.g., ingest.us0.signalfx.com) + default: "" + third_party_account_id: + description: > + Account ID for New Relic (required for New Relic events endpoint auto-generation). + Not required for other destinations. Leave empty if using explicit endpoint configuration. + default: "" + third_party_api_key: + description: > + API key for the third-party monitoring service (required when destination is specified). + default: "" + third_party_endpoints_metrics: + description: > + Endpoint URL for sending metrics to the third-party monitoring service. + default: "" + third_party_endpoints_logs: + description: > + Endpoint URL for sending logs to the third-party monitoring service. + default: "" + third_party_endpoints_events: + description: > + Endpoint URL for sending events to the third-party monitoring service. + default: "" + third_party_batch_size: + description: > + Batch size for sending telemetry data to third-party monitoring service. + default: 100 + third_party_max_retry_duration_sec: + description: > + Maximum retry duration in seconds for failed requests to third-party monitoring service. + default: 300 + third_party_request_timeout_sec: + description: > + Request timeout in seconds for third-party monitoring service API calls. + default: 30 + third_party_secrets_scope: + description: > + Databricks secret scope for storing sensitive information like API keys. + See: https://docs.databricks.com/aws/en/security/secrets/ + default: "third-party-monitoring-secrets" + azure_client_id: + description: > + Azure service principal client ID for authentication (required when destination is azuremonitor). + default: "" + azure_client_secret: + description: > + Azure service principal client secret for authentication (required when destination is azuremonitor). + default: "" + azure_authorization_endpoint: + description: > + Azure AD token endpoint URL for obtaining access tokens (required when destination is azuremonitor). + Example: https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token + default: "" + azure_max_access_token_staleness: + description: > + Maximum time in seconds before refreshing the Azure access token. + default: 3300 + splunk_access_token: + description: > + Splunk Observability Cloud organization access token for metrics, events, and logs (required when destination is splunk_observability). + Logs are sent as events to eliminate the need for HEC setup. + Refer: https://docs.splunk.com/observability/en/admin/authentication/authentication-tokens/org-tokens.html + default: "" + azure_tenant_id: + description: > + Azure AD tenant ID for authentication (required when destination is azuremonitor). + Used to construct the authorization endpoint: https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token + Example: 12345678-1234-1234-1234-123456789012 + default: "" + azure_dcr_immutable_id: + description: > + Azure Data Collection Rule immutable ID (required when destination is azuremonitor). + Used to auto-generate data ingestion endpoints for metrics, logs, and events streams. + Example: dcr-1234567890abcdef1234567890abcdef + default: "" \ No newline at end of file