Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc][2.25] Query diagnostics #25463

Merged
merged 11 commits into from
Jan 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,18 @@ This view provides a list of wait events and their metadata. The columns of the
| client_node_ip | text | IP address of the client which sent the query to YSQL/YCQL. Null for background activities. |
| sample_weight | float | If in any sampling interval there are too many events, YugabyteDB only collects `ysql_yb_ash_sample_size` samples/events. Based on how many were sampled, weights are assigned to the collected events. <br><br>For example, if there are 200 events, but only 100 events are collected, each of the collected samples will have a weight of (200 / 100) = 2.0 |

To obtain the IP address and location of a node where a query is being executed, use the `top_level_node_id` from the active session history view in the following command:

```sql
SELECT * FROM pg_catalog.yb_servers() WHERE uuid = <top_level_node_id>;
```

``` output
host | port | num_connections | node_type | cloud | region | zone | public_ip | uuid
--------------+------+-----------------+-----------+-------+-----------+------------+-----------+----------------------------------
10.9.111.111 | 5433 | 0 | primary | aws | us-west-2 | us-west-2a | | 5cac7c86ba4e4f0e838bf180d75bcad5
```

## Wait events

List of wait events by the wait event components.
Expand Down
8 changes: 8 additions & 0 deletions docs/content/preview/explore/query-1-performance/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,11 @@ Ensure that the threshold is high enough so that you don't flood the `postgres*l
{{<lead link="/preview/troubleshoot/nodes/check-logs/#yb-tserver-logs">}}
Learn more about [YB-TServer logs](/preview/troubleshoot/nodes/check-logs/#yb-tserver-logs).
{{</lead>}}

## Export query diagnostics

Capture and export detailed query diagnostic information across multiple dimensions to help identify and resolve database query problems.

{{<lead link="./query-diagnostics/">}}
To learn more, see [Query diagnostics](./query-diagnostics/).
{{</lead>}}
243 changes: 243 additions & 0 deletions docs/content/preview/explore/query-1-performance/query-diagnostics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
---
title: Query diagnostics
linkTitle: Query diagnostics
headerTitle: Query diagnostics
headcontent: Export diagnostic information for analysis
tags:
feature: tech-preview
menu:
preview:
identifier: query-diagnostics
parent: query-tuning
weight: 500
type: docs
---

Use query diagnostics to capture and export detailed diagnostic information across multiple dimensions, and use this information to identify and resolve database query problems.

Query diagnostics collects data about a particular query, over several executions of the query, over the duration of a specified diagnostics interval. It also manages in-flight requests during this interval, and transfers the collected data to the file system for storage and future analysis.

Query diagnostics collects the following information:

- Bind variables and constants
- PostgreSQL statement statistics
- Schema details
- Active Session History
- Explain plans

## Enable query diagnostics

Query diagnostics is {{<tags/feature/tp>}}. To use query diagnostics, enable and configure the following flags for each node of your cluster.

| Flag | Description |
| :--- | :---------- |
| allowed_preview_flags_csv | Set the value of this flag to include `ysql_yb_enable_query_diagnostics`. |
| ysql_yb_enable_query_diagnostics | Enable or disable query diagnostics. <br>Default: false. Changing this flag requires a VM restart. |
| yb_query_diagnostics_circular_buffer_size | Size (in KB) of query diagnostics circular buffer that stores statuses of bundles.<br>Default: 64. Changing this flag requires a VM restart. |

## Export diagnostics

Use the [pg_stat_statements](../pg-stat-statements/) view to obtain the `queryid` of a query of interest.

To initiate query diagnostics, you use the `yb_query_diagnostics()` function, providing the `queryid`.

For example:

```sql
SELECT yb_query_diagnostics(
query_id => 7317833532428971166,
explain_sample_rate => 5,
diagnostics_interval_sec => 120,
explain_analyze => true,
explain_dist => true,
bind_var_query_min_duration_ms => 10
);
```

```output
yb_query_diagnostics
/Users/($username)/yugabyte-data/node-1/disk-1/query-diagnostics/7317833532428971166/<random-number>
(1 row)
--------------------------------------------------------------------------------
```

This exports diagnostics for the query with the specified ID, for a duration of 120 seconds, and includes EXPLAIN(ANALYZE, DIST) plans for 5% of all queries observed during this time frame. It also dumps constants/bind variables for queries that took more than 10ms.

To cancel a running query diagnostic, use the `yb_cancel_query_diagnostics()` function.

For example:

```sql
SELECT yb_cancel_query_diagnostics(query_id => 7317833532428971166);
```

### Options

| Parameter | Description | Default |
| :--- | :--- | :--- |
| query_id | Required. A unique identifier for the query used to identify identical normalized queries. You can obtain query IDs using the pg_stat_statements view. This is also the same as the query_id in the yb_active_session_history. | |
| diagnostics_interval_sec | The duration for which the bundle will run, in seconds. | 300 |
| explain_sample_rate | You can export the output of the EXPLAIN command for a randomly selected percentage (1-100) of queries that are running during the diagnostics interval. | 1 |
| explain_analyze | Enhance the EXPLAIN plan output with planning and execution data. Note that this data is gathered during query execution itself, and the query is not re-executed. | false |
| explain_dist | Log distributed data in the EXPLAIN plan. explain_analyze must be set to true. | false |

### Check diagnostic status

Use the `yb_query_diagnostics_status` view to check the status of previously executed query diagnostic bundles.

| Column | Description |
| :----- | :---------- |
| query_id | Hash code to identify identical normalized queries. |
| start_time | Timestamp when the bundle was triggered. |
| diagnostics_interval_sec | The duration for which the bundle will run. |
| explain_params | Flags used to gather EXPLAIN plans. |
| bind_var_query_min_duration_ms | The minimum query duration for logging bind variables. |
| folder_path | File system path where the data for this bundle has been dumped. |
| state | The state of the diagnostic - Completed, In Progress, or Failed. |
| status | In case of a failure, a short description of the type of failure. |

## Output

When the specified diagnostic interval has elapsed, query diagnostics outputs the data to files in the following directory:

`pg_data/query_diagnostics/query_id/<random-number>/`

To locate your data directory, run `SHOW data_directory` in ysqlsh.

### Constants

Provides details of the executed bind variables when the specified query ID corresponds to a prepared statement. Otherwise, provides the constants embedded in the query. Additionally, it includes the time of each execution along with the bind variable. Using the `bind_var_query_min_duration_ms` parameter, you can log only those bind variables that take more than a specified duration.

Output file: constants_and_bind_variables.csv

For example:

```output
| var1 | var2 | var3 | query_time |
| 19146 | 9008 | 'text_1' | 12.323 |
```

### pg_stat_statements

Presents aggregated pg_stat_statements data spanning the diagnostics interval of query diagnostics for the specified query ID.

Output file: pg_stat_statements.csv

For more information, see [Get query statistics using pg_stat_statements](../pg-stat-statements/).

### Schema information

Provides information about the columns and indexes associated with unique tables mentioned in the query being diagnosed.

The output is the same as that obtained using the [\d+ meta-command](../../../api/ysqlsh-meta-commands/#d-s-pattern-patterns).

Output file: schema_details.txt

For example, the following query would generate schema details as follows:

```sql
SELECT
a,
b
FROM
test_table1
WHERE
b > 0;
```

```output
Table name: test_table1
- Table information:
|Table Name |Table Groupname |Colocated |
+-------------------+----------------+----------+
|public.test_table1 | |false |

- Columns:
|Column |Type |Collation |Nullable |Default |Storage |Stats Target |Description |
+-------+-----------------+----------+---------+--------+---------+-------------+------------+
|a |text | | | |extended | | |
|b |integer | | | |plain | | |
|c |double precision | | | |plain | | |
```

### Active session history

Outputs the data from the `yb_active_session_history` view, for the diagnostics interval. You can later import the table in SQL and query it to get data for only the specified query ID.

Output file: active_session_history.csv

For more information on active session history, see [Active Session History](../../observability/active-session-history/).

### Explain plans

Provides the output of the EXPLAIN command for a randomly selected subset of queries.

Normally, all data is written to the file system when the diagnositcs interval has expired. However, if the explain data being gathered exceeds the internal buffer size, query diagnostics will flush the data to the file system to make room for data from further queries.

Output file: explain_plan.txt

For example:

```output
QUERY PLAN:
--------------------------------------------------------------------------------
Insert on n (cost=0.00..0.01 rows=1 width=44) (actual time=9.114..9.114 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=44) (actual time=0.003..0.005 rows=1 loops=1)
Storage Table Write Requests: 1
Planning Time: 0.142 ms
Execution Time: 9.294 ms
Storage Read Requests: 0
Storage Rows Scanned: 0
Storage Write Requests: 1
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 0
Storage Execution Time: 0.000 ms
Peak Memory Usage: 24 kB
(13 rows)
--------------------------------------------------------------------------------
Insert on n (cost=0.00..0.01 rows=1 width=44) (actual time=9.114..9.114 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=44) (actual time=0.003..0.005 rows=1 loops=1)
Storage Table Write Requests: 1
Planning Time: 0.142 ms
Execution Time: 9.294 ms
Storage Read Requests: 0
Storage Rows Scanned: 0
Storage Write Requests: 1
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 0
Storage Execution Time: 0.000 ms
Peak Memory Usage: 24 kB
(13 rows)
...
--------------------------------------------------------------------------------
```

For more information about using EXPLAIN, see [Analyze queries with EXPLAIN](../explain-analyze/).

## Limitations

- Multiple diagnostic bundles can be triggered concurrently for different queries; however, only one bundle can run at a time for a single query ID.
- Query diagnostics is available per node and is not aggregated across the cluster.
- Schema details can export a maximum 10 unique schemas in the query.
- You cannot have more than 100 simultaneous query diagnostic operations.
- The feature is designed to minimize its impact on query latency. However, operations like EXPLAIN ANALYZE and I/O tasks may affect the data being analyzed.
- Memory consumption may increase as data is stored in memory before being flushed to disk. Memory consumption while accumulating data is as follows:

```output
num_queries_diagnosed * (mem_explain + mem_bind_var + mem_schema_details + mem_ash + mem_pg_stat_statements)
```

Where
- num_queries_diagnosed ≤ 100 bytes
- mem_explain ≤ 16 kb
- mem_bind_var ≤ 2 kb
- mem_pg_stat_statements ≤ 1 kb

- Folders generated by query diagnostics are not cleaned up automatically.
- If any query has more than 16384 chars in explain_plan, then it is not dumped. See [issue 23720](https://github.com/yugabyte/yugabyte-db/issues/23720).

## Learn more

- [Cost Based Optimizer](https://www.yugabyte.com/blog/yugabytedb-cost-based-optimizer/)