Skip to content

Commit 3d9b822

Browse files
committed
Merge branch 'main' into 'main'
Days 14 to 20 See merge request postgres-ai/postgresql-consulting/postgres-howtos!3
2 parents 7f28573 + c84d508 commit 3d9b822

11 files changed

+983
-0
lines changed
+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
Originally from: [tweet](https://twitter.com/samokhvalov/status/1711575029006418218), [LinkedIn post](...).
2+
3+
---
4+
5+
# How to decide when query is too slow and needs optimization
6+
7+
> I post a new PostgreSQL "howto" article every day. Join me in this
8+
> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!
9+
10+
"Slow" is a relative concept. In some cases, we might be happy with query latency 1 minute (or no?), while in other
11+
scenarios, even 1 ms might seem to be too slow.
12+
13+
Decision when to apply optimization techniques is important for efficiency – as Donald Knuth famously stated in "The Art
14+
of Computer Programming":
15+
16+
> The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at
17+
> the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.
18+
19+
Below we assume that we work with OLTP or hybrid workloads and need to decide if a certain query is too slow and
20+
requires optimization.
21+
22+
## How to conclude that a query is too slow
23+
24+
1. Do you have an OLTP case or an analytical one, or hybrid? For OLTP cases, requirements are more strict and dictated
25+
by human perception (see: [What is a slow SQL query?](https://postgres.ai/blog/20210909-what-is-a-slow-sql-query)),
26+
while for analytical needs, we can usually wait a minute or two – unless it's also user-facing. If it is, we probably
27+
consider 1 minute as too slow. In this case, consider using column-store database systems (and Postgres ecosystem has
28+
a new offering here: check out [@hydradatabase](https://twitter.com/hydradatabase)). For OLTP, the majority of
29+
user-facing queries should be below 100ms – ideally, below 10ms – so the complete
30+
requests to your backends that users make, do not exceed 100-200ms (each request can issue several SQL queries,
31+
depending on the case). Of course, non-user-facing queries such as those coming from background jobs, `pg_dump`, and so
32+
on, can last longer – assuming that the next principles are met.
33+
34+
2. In the case of OLTP, the second question should be: is this query "read-only" or it changes the data (be it DDL or
35+
just writing DML – INSERT/UPDATE/DELETE)? In this case, in OLTP, we shouldn't allow it to run longer than a second or
36+
two, unless we are 100% sure that this query won't block other queries for long. For massive writes, consider
37+
splitting them in batches so each batch doesn't last longer than 1-2 seconds. For DDL, be careful with lock
38+
acquisition and lock chains (read these
39+
posts: [Common DB schema change mistakes](https://postgres.ai/blog/20220525-common-db-schema-change-mistakes#case-5-acquire-an-exclusive-lock--wait-in-transaction)
40+
and
41+
[Useful queries to analyze PostgreSQL lock trees (a.k.a. lock queues)](https://postgres.ai/blog/20211018-postgresql-lock-trees)).
42+
43+
3. If you're dealing with a read-only query, make sure it's also not running for too long – long-running transactions
44+
make Postgres hold old dead tuples for long ("xmin horizon" is not advancing), so autovacuum cannot delete dead
45+
tuples that became dead after the start of our transaction. Aim to avoid transactions that last longer than one or a
46+
few hours (and if you absolutely need such long transactions, prefer running them at low-activity hours, when XID is
47+
progressing slowly, and do not run them often).
48+
49+
4. Finally, even if a query is relatively fast – for instance, 10ms – it might still be considered too slow if its
50+
frequency is high. For example, 10ms query running 1,000 times per second (you can check it via
51+
`pg_stat_statements.calls`), then Postgres needs to spend 10 seconds *every* second to process this group of queries.
52+
In this case, if lowering down the frequency is hard, the query should be considered slow, and an optimization
53+
attempt needs to be performed, to reduce resource consumption (the goal here is to reduce
54+
`pg_stat_statements.total_exec_time` – see
55+
the [previous #PostgresMarathon posts about pgss](https://twitter.com/search?q=%23PostgresMarathon%20pg_stat_statements&src=typed_query&f=live)).
56+
57+
## Summary
58+
59+
- All queries that last longer than 100-200 ms should be considered as slow, if they are user-facing. Good queries are
60+
those that are below 10 ms.
61+
- Background processing queries are ok to last longer. If they modify data and might block user-facing queries, then
62+
they should not be allowed to last longer than 1-2 s.
63+
- Be careful with DDLs – make sure they don't cause massive writes (if they do, it should be split into batches as
64+
well), and use low `lock_timeout` and retries to avoid blocking chains.
65+
- Do not allow long-running transactions. Make sure the xmin horizon is progressing and autovacuum can remove dead
66+
tuples promptly – do not allow transactions that last too long (>1-2h).
67+
- Optimize even fast (<100ms) queries if the corresponding `pg_stat_statements.calls` and
68+
`pg_stat_statements.total_exec_time` are high.
+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
Originally from: [tweet](https://twitter.com/samokhvalov/status/1711982079838507225), [LinkedIn post](...).
2+
3+
---
4+
5+
# How to monitor CREATE INDEX / REINDEX progress in Postgres 12+
6+
7+
> I post a new PostgreSQL "howto" article every day. Join me in this
8+
> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!
9+
10+
To monitor the progress of long-running index building or rebuilding, you can use this query:
11+
12+
```sql
13+
select
14+
now(),
15+
query_start as started_at,
16+
now() - query_start as query_duration,
17+
format('[%s] %s', a.pid, a.query) as pid_and_query,
18+
index_relid::regclass as index_name,
19+
relid::regclass as table_name,
20+
(pg_size_pretty(pg_relation_size(relid))) as table_size,
21+
nullif(wait_event_type, '') || ': ' || wait_event as wait_type_and_event,
22+
phase,
23+
format(
24+
'%s (%s of %s)',
25+
coalesce((round(100 * blocks_done::numeric / nullif(blocks_total, 0), 2))::text || '%', 'N/A'),
26+
coalesce(blocks_done::text, '?'),
27+
coalesce(blocks_total::text, '?')
28+
) as blocks_progress,
29+
format(
30+
'%s (%s of %s)',
31+
coalesce((round(100 * tuples_done::numeric / nullif(tuples_total, 0), 2))::text || '%', 'N/A'),
32+
coalesce(tuples_done::text, '?'),
33+
coalesce(tuples_total::text, '?')
34+
) as tuples_progress,
35+
current_locker_pid,
36+
(select nullif(left(query, 150), '') || '...' from pg_stat_activity a where a.pid = current_locker_pid) as current_locker_query,
37+
format(
38+
'%s (%s of %s)',
39+
coalesce((round(100 * lockers_done::numeric / nullif(lockers_total, 0), 2))::text || '%', 'N/A'),
40+
coalesce(lockers_done::text, '?'),
41+
coalesce(lockers_total::text, '?')
42+
) as lockers_progress,
43+
format(
44+
'%s (%s of %s)',
45+
coalesce((round(100 * partitions_done::numeric / nullif(partitions_total, 0), 2))::text || '%', 'N/A'),
46+
coalesce(partitions_done::text, '?'),
47+
coalesce(partitions_total::text, '?')
48+
) as partitions_progress,
49+
(
50+
select
51+
format(
52+
'%s (%s of %s)',
53+
coalesce((round(100 * n_dead_tup::numeric / nullif(reltuples::numeric, 0), 2))::text || '%', 'N/A'),
54+
coalesce(n_dead_tup::text, '?'),
55+
coalesce(reltuples::int8::text, '?')
56+
)
57+
from pg_stat_all_tables t, pg_class tc
58+
where t.relid = p.relid and tc.oid = p.relid
59+
) as table_dead_tuples
60+
from pg_stat_progress_create_index p
61+
left join pg_stat_activity a on a.pid = p.pid
62+
order by p.index_relid
63+
; -- in psql, use "\watch 5" instead of semicolon
64+
```
65+
66+
The same query, in [a better formatted form](https://gitlab.com/-/snippets/2138417).
67+
68+
How this query works:
69+
70+
1. The basis of it
71+
is `pg_stat_progress_create_index` [added in Postgres 12](https://postgresql.org/docs/current/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING).
72+
73+
2. The documentation also has a list of `CREATE INDEX` phases. As you can see from the table provided in the docs,
74+
advanced variant, `CREATE INDEX CONCURRENTLY` / `REINDEX CONCURRENTLY` (a.k.a. CIC and RC), which takes longer but
75+
acts in a non-blocking fashion suitable for loaded production systems, has more phases. The current phase is
76+
presented in the column "phase" of the output.
77+
78+
3. Index name (a temporary one in case of CIC/RC), table name are presented (using the useful trick to convert OIDs to
79+
names – note, e.g., `index_relid::regclass as index_name`). Additionally, the table size which is essential to form
80+
expectations of overall duration – the bigger the table is, the longer the index creation is going to take.
81+
82+
4. `pg_stat_activity` (`pgsa`) provides a lot of additional useful information:
83+
- PID of Postgres backend
84+
- actual SQL query used
85+
- the moment when the work has started (`query_start`), allowing us to understand the elapsed
86+
time (`query_duration`)
87+
- `wait_event_type` & `wait_event` to understand what the process is currently waiting on
88+
- it also used (in a separate sub-query) to get the information of the session that blocks our process, when such an
89+
event occurs (`current_locker_pid`, `current_locker_query`)
90+
91+
5. Function `format(...)` is very useful to consolidate data in convenient form without having to worry about `NULL`s,
92+
that
93+
would be a problem if we used a regular concatenation without `coalesce(...)`.
94+
95+
6. However, in certain cases we use `coalesce(...)` just to put special symbols if the value is missing (`IS NULL`) –
96+
e.g., "?" or "N/A".
97+
98+
7. Another interesting trick is combination of `coalesce(...)` and `nullif(...)`. The latter allows us to avoid division
99+
by
100+
zero errors (replacing `0` with `NULL`, making the division result also `NULL`), while the former, again, is used to
101+
substitute resulting `NULL` with some non-empty value (in this case, 'N/A').
102+
103+
When executed in `psql`, it can be convenient to use `\watch [seconds]` to run this report in loop and observe progress
104+
live:
105+
106+
![tracking the progress of index building/rebuilding](files/0015_reindex.gif)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
Originally from: [tweet](https://twitter.com/samokhvalov/status/1712342522314572064), [LinkedIn post](...).
2+
3+
---
4+
5+
# How to get into trouble using some Postgres features
6+
7+
> I post a new PostgreSQL "howto" article every day. Join me in this
8+
> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!
9+
10+
Today we have quite entertaining material, but knowing (and avoiding) these things can save you time and effort.
11+
12+
## NULLs
13+
14+
`NULL`s, while being very common, are the most popular way to get into trouble when using SQL in general, and Postgres
15+
is
16+
no exception.
17+
18+
For example, one may forget that concatenation (`||`), arithmetic operations (`*`, `/`, `+`, `-`), traditional
19+
comparison operators (`=`, `<`, `>`, `<=`, `>=`, `<>`) are all not NULL-safe operations, and later be very surprised
20+
that the result is lost.
21+
22+
Especially it hurts when you build a startup and some important business logic depends on it, a query not
23+
handling `NULL`s properly, leading to loss of user base or money or time (or all that):
24+
25+
```
26+
nik=# \pset null ∅
27+
Null display is "∅".
28+
29+
nik=# select null + 1;
30+
?column?
31+
----------
32+
33+
34+
35+
(1 row)
36+
```
37+
38+
`NULL`s can be really dangerous and even experienced engineers continue to bump into issues when working with them. Some
39+
useful materials to educate yourself:
40+
41+
- [NULLs: the good, the bad, the ugly, and the unknown](https://postgres.fm/episodes/nulls-the-good-the-bad-the-ugly-and-the-unknown)
42+
(podcast)
43+
- [What is the deal with NULLs?](http://thoughts.davisjeff.com/2009/08/02/what-is-the-deal-with-nulls/)
44+
45+
A couple of tips – how to make your code NULL-safe:
46+
47+
- Consider using expressions like `COALESCE(val, 0)` for replace `NULL`s with some value (usually `0` or `''`).
48+
- For comparison, instead of `=` or `<>`: `IS [NOT] DISTINCT FROM` (check out the `EXPLAIN` plan though).
49+
- Instead of concatenation, use: `format('%s %s', var1, var2)`.
50+
- Don't use `WHERE NOT IN (SELECT ...)` – use `NOT EXISTS` instead (
51+
see thia [JOOQ blog post](https://jooq.org/doc/latest/manual/reference/dont-do-this/dont-do-this-sql-not-in/)).
52+
- Just be careful. `NULL`s are treacherous.
53+
54+
## Subtransactions under heavy loads
55+
56+
If you aim to grow to dozens or hundreds of thousands of TPS and want to have various issues, use subtransactions.
57+
Probably, you use them implicitly – e.g., if you use Django, Rails, or `BEGIN/EXCEPTION` blocks in PL/pgSQL.
58+
59+
Why you might want to get rid of subtransactions completely:
60+
[PostgreSQL Subtransactions Considered Harmful](https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful)
61+
62+
## int4 PK
63+
64+
Zero-downtime conversion of `int4` (a.k.a. int a.k.a. integer) PK to `int8` when the table has 1B rows requires a lot of
65+
efforts. While table `(id int4, created_at timestamptz)` is going to take the same disk space as
66+
`(id int8, created_at timestamptz)` due to [alignment padding](https://stackoverflow.com/a/7431468/459391).
67+
68+
## (Exotic) SELECT INTO is not you think it is
69+
70+
One day I was debugging a PL/pgSQL function, and copy-pasted a query like this, to `psql`:
71+
72+
```
73+
nik=# select * into var from t1 limit 1;
74+
SELECT 1
75+
```
76+
77+
It worked! This is a huge surprise – in SQL context (not
78+
PL/pgSQL), [SELECT INTO](https://postgresql.org/docs/current/sql-selectinto.html) is a DDL command that creates a table
79+
and inserts data into it (shouldn't this be deprecated already?)
80+
81+
## Thinking that "transactional DDL" is easy
82+
83+
Yes, Postgres has "transactional DDL" and you can benefit from it a lot – until you cannot. Under load, you cannot rely
84+
on it – instead, you need to start using zero-downtime methodologies and avoid mistakes (
85+
read: [common db schema change mistakes](https://postgres.ai/blog/20220525-common-db-schema-change-mistakes), and rely
86+
on "non-transactional" DDL such as
87+
`CREATE INDEX CONCURRENTLY`, assuming that some attempts might fail, after which cleanup is needed before retrying.
88+
89+
A big problem with DDL deployment under load is that by default, you can have downtime attempting to deploy a very light
90+
schema change – unless you implement a logic with low `lock_timeout` and retries (
91+
see: [zero-downtime postgres schema migrations lock timeout and retries](https://postgres.ai/blog/20210923-zero-downtime-postgres-schema-migrations-lock-timeout-and-retries)).
92+
93+
## DELETE a lot of rows with one command
94+
95+
This is a good way to get into trouble: issue a `DELETE` of millions of rows and wait. If `checkpointer` is not tuned
96+
(`max_wal_size = 1GB`), if tuples are deleted via an `IndexScan` access (meaning the process of making pages dirty is
97+
quite "random"), and disk IO is quite throttled, this may put your system down. And even if it survives the stress,
98+
you'll get:
99+
100+
- risks of locking issues (`DELETE` blocking some writes issued by other users),
101+
- a large number of dead tuples produced, to be converted to bloat later by `autovacuum`.
102+
103+
What to do:
104+
105+
- split to batches,
106+
- if massive write is inevitable, consider raising `max_wal_size` temporarily, which doesn't require restart (however:
107+
this potentially increases recovery time if server crashes during this procedure).
108+
109+
Read [common db schema change mistakes](https://postgres.ai/blog/20220525-common-db-schema-change-mistakes#case-4-unlimited-massive-change).
110+
111+
## Other "Don't do" articles
112+
113+
- [Depesz: Don’t do these things in PostgreSQL](https://depesz.com/2020/01/28/dont-do-these-things-in-postgresql/)
114+
- [PostgreSQL Wiki: Don't Do This](https://wiki.postgresql.org/wiki/Don't_Do_This)
115+
- [JOOQ: Don't do this](https://jooq.org/doc/latest/manual/reference/dont-do-this/)

0 commit comments

Comments
 (0)