Skip to content

DOC-14077 Product Change- PR #148542 - sql: add probabilistic transaction tracing #19909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 36 additions & 1 deletion src/current/v25.3/query-behavior-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ You can identify high-latency SQL statements on the [**Insights**]({% link {{ pa

You can also enable the [slow query log]({% link {{ page.version.version }}/logging-use-cases.md %}#sql_perf) to log all queries whose latency exceeds a configured threshold, as well as queries that perform a full table or index scan.

You can collect richer diagnostics of a high-latency statement by creating a [diagnostics bundle]({% link {{ page.version.version }}/ui-statements-page.md %}#diagnostics) when a statement fingerprint exceeds a certain latency.
You can collect richer diagnostics of a high-latency statement by creating a [diagnostics bundle]({% link {{ page.version.version }}/ui-statements-page.md %}#diagnostics) when a statement fingerprint exceeds a certain latency. Identify slow transactions in an active workload by [selectively logging traces of transactions](#log-traces-for-transactions) that exceed a configured latency threshold.

{{site.data.alerts.callout_info}}
{% include {{ page.version.version }}/prod-deployment/resolution-untuned-query.md %}
Expand Down Expand Up @@ -109,6 +109,41 @@ docker run -d --name jaeger \
-p 6831:6831/udp -p 16686:16686 jaegertracing/all-in-one:latest
~~~

### Log traces for transactions

CockroachDB allows you to trace [transactions]({% link {{ page.version.version }}/transactions.md %}) to help troubleshoot performance issues. [Tracing]({% link {{ page.version.version }}/show-trace.md %}#trace-description) is controlled through two cluster settings that govern when a transaction trace is captured and emitted.

#### Trace sampling and emission

To enable tracing for a subset of transactions and emit relevant traces to the [`SQL_EXEC` logging channel]({% link {{ page.version.version }}/logging-overview.md %}#logging-channels), configure the following cluster settings:

- {% include_cached new-in.html version="v25.3.0" %}[`sql.trace.txn.sample_rate`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-trace-txn-sample-rate): Specifies the probability (between `0.0` and `1.0`) that a given transaction will have tracing enabled. A value of `0.01` means that approximately 1% of transactions are traced. The default is `1`, which means 100% of transactions are sampled.
- [`sql.trace.txn.enable_threshold`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-trace-txn-enable-threshold): Specifies a duration threshold. A trace is emitted only if a sampled transaction's execution time exceeds this value. When set to `0` (default), tracing is disabled regardless of whether the value of `sql.trace.txn.sample_rate` is greater than `0`.

To emit a trace to the logs, the following conditions must be met:

1. The transaction is selected based on the sampling probability.
1. Its execution duration exceeds the configured threshold.

This approach minimizes overhead by tracing a fraction of the workload and emitting traces only for potentially relevant transactions.

#### Configuration example

{% include_cached copy-clipboard.html %}
~~~ sql
-- Enable trace sampling at 1%
SET CLUSTER SETTING sql.trace.txn.sample_rate = 0.01;

-- Emit traces for sampled transactions that exceed 1s
SET CLUSTER SETTING sql.trace.txn.enable_threshold = '1s';
~~~

With this configuration, approximately 1% of transactions are traced, and only those running longer than 1s will have their traces written to the logs. In the `SQL_EXEC` log, a line similar to the following precedes the trace:

~~~
SQL txn took 2.004362083s, exceeding threshold of 1s:
~~~

<a id="query-is-always-slow"></a>

### Queries are always slow
Expand Down
Loading