Complete Block Check: PostgreSQL Storage Benchmark

Complete Block Check (CBC) stresses multiple major parts of block I/O in the database using synthetic test data. It was developed against PostgreSQL 15 and uses CREATE TABLE AS, VACUUM, CREATE INDEX, and CLUSTER. Two simple but difficult to execute queries are run along the way, initially with a Sequential Scan. By the end one query should use a quick indexed lookup.

To learn about the individual statements of CBC, how and why they work, see the CBC Tutorial in this same directory.

Background

I’ve evaluated many benchmark ideas that stress storage before, usually using pgbench to generate the test database and workload. When I was working on loading the Open Street Map data, I noticed that CREATE TABLE AS is really fast now, fast enough to be a load generator if you have a wide table to work with. And there is a wide table in the system views! pg_settings has a lot of columns of all sorts of data types, a lot more variation than the trivial pgbench tables.

Example collector output

Here's a lightly formatted example via the collector script. Hardware is AMD 7700X processor, 128GB RAM, SK41 2TB SSD; times are in milliseconds, sizes in bytes, disk rates in MB/s:

SERVER        siren
CREATE_TABLE  33116.420
VACUUM        9293.740
CREATE_INDEX  36316.109
SELECT_u1     832.447
SELECT_b      209.221
SELECT_r      2443.925
CLUSTER       87918.913
SELECT_u2     81.555
SELECT_c      80.997
SIZE_T        21837168640
SIZE_I        636223488
READ_RATE     2277.521
WRITE_RATE    2279.278

The READ/WRITE_RATE figures come from the VACUUM command and only apply to that line. During benchmark development, that command was frequently the highest rate step. Note that this server was configured for performance rather than reliability, with WAL and all nornal sync work disabled. In a default configuration vacuum work will usually hit a WAL bottleneck before it stresses CPU and disk to the extent this sample does.

psql driver

You can run a CBC in command line psql like this:

psql -d $DB -v scale=$SIZEDB -ef cbc-psql.sql > $OUTBASE.log 2>&1

For testing in a browser psql like the Crunchy PostgreSQL Playground, you just have to put the scale number in manually instead:

\set SIZEGB 1

CLI collector driver

The program includes a standalone results collector script named cbc that runs the benchmark and saves to a text file in results/ directory.

Inputs:

Database size in GB. Defaults to 20.

Output is in the key/value style, shown as an example above.

pgbench driver

The workload runs in pgbench utilizing its scale setting. To make this compatible with using pgbench-tools to monitor each step, they are split the pieces up into individual files, also allowing execution by multiple clients. Currently that includes:

cbc-t00-ctas.sql	cbc-t01-vacuum.sql	cbc-t02-index.sql
cbc-t03-select-max.sql	cbc-t04-selectr.sql	cbc-t05-cluster.sql

The full series of pgbench tests have been packed into a workload script in the source tree at wl/cbc that includes multiple client counts for the SELECT statements.

Here's a manual run of the benchmark steps using pgbench to monitor each step:

createdb cbc
psql -d cbc -c "DROP TABLE settings_loop"
pgbench -n -t 1 -c 1 -s 20 -f cbc-t00-ctas.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t01-vacuum.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t02-index.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t03-select-max.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t03-select-max.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t04-selectr.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t05-cluster.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t03-select-max.sql cbc
pgbench -n -t 1 -c 1 -s 20 -f cbc-t03-select-max.sql cbc

Many of these steps can only run once. The main SELECT query can of course be run in parallel. This example starts 4 clients running it for 120 seconds, same as the automated workload because that's usually enough to observe heat issues:

pgbench -n -T 120 -c 4 -s 20 -f cbc-t03-select-max.sql cbc

Note that while the other methods of running CBC allow fractions of a GB for the size setting, pgbench scales must be integers of 1 or larger. To test <1GB configurations with pgbench you'd need to adjust the way the input scale is handled.

Automated workload driver

A pgbench CBC workload driver at wl/cbc is available to make it easier to run every step with individual metrics collection. pgbench-tools workloads generate a series of configuration files, run each test with pgbench, and save the results into a results database for analysis. A sample report at reports/cbc-best.sql shows the results with associated disk metrics. On the Mac this includes only disk0 throughput.

The workload version of CBC also includes a second copy of the cbc-t03-select-max script labeled to distinguish the runs expected to use an index. The test itself is identical, the script name is simply cbc-t06-iselect-max instead for identification and sorting.

Storage throughput validation

To prove that CBC is effective at stressing system storage, four test systems were used:

MacBook Pro 16: M1 Pro CPU, 16GB RAM, 1TB SSD
MacBook Air 15: M2 CPU, 8GB RAM, 512GB SSD
AMD Ryzen 7700X: 128GB RAM, 2TB SK41 SSD
AMD Ryzen 5950X: 128GB RAM, 2TB Inland Performance PCI 4.0 SSD

Tests tried a range of sizes, and as usual for benchmarking like this 3-4X RAM was enough to reach maximum speed in most tests.

CPU	RAM	DB GB	Script	Metric	Best	Max	% Max
M1 Pro	16	32	t00-ctas	R+W	5107	5263	97%
M2 Air	8	20	t00-ctas	R+W	2556	3238	79%
R9 5950X	128	384	t04-selectr	read	4709	5000	94%
R9 5950X	128	256	t00-ctas	write	4072	4300	95%
R7 7700X	128	384	t04-selectr	read	6240	7000	89%
R7 7700X	128	384	t00-ctas	write	6529	6500	100%

On the Mac, disk I/O data collected with iostat groups read and write operations into one total. In theory then, the M1 Pro capable of 5263MB/s writes and slightly higher reads has a true theoretical max combined value of over 10GB/s.

Here though, max values for the Mac systems only reflect the one I/O operation that the tests involved exercise. For example, t00-ctas only generates write traffic. The max possible on that test is the system's best write speed, not the max possible for the combined R+W metric being measured.

Some other systems were lightly tested as well, and only the systems with strong thermal activity throttling failed to reach the 90% goal development aimed at. The passive cooling of the Mac M2 Air for example, it's extremely hard to reach maximum I/O rates within its thermal constraints.

Credits

cbc was developed by Greg Smith as part of the PostgreSQL regression testing effort at Crunchy Data. The scripts are being initially released as part of the pgbench-tools repository.

Dedication

Greg's benchmark lab names its many servers after favorite songs. The AMD AM5 7700X system used for this development was named for "Siren Song" on the 1993 Alan Parsons Album Try Anything Once, written by Alan Parsons Project guitarist Ian Bairnson with composer Frank Musker. Ian passed away in April of 2023 just as cbc validation started on siren. Ian was Greg's friend and he went out at the top! Ian played on Kate Bush's first hit, and her "Running Up That Hill (A Deal with God)" (with APP drummer Stuart Elliott) featured prominently in Season 4 of the hit show Stranger Things. That brought the song smashing back to the radio internationally and filled Ian's last year with press coverage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Complete Block Check: PostgreSQL Storage Benchmark

Background

Example collector output

psql driver

CLI collector driver

pgbench driver

Automated workload driver

Storage throughput validation

Credits

Dedication

Files

README.md

Latest commit

History

README.md

File metadata and controls

Complete Block Check: PostgreSQL Storage Benchmark

Background

Example collector output

psql driver

CLI collector driver

pgbench driver

Automated workload driver

Storage throughput validation

Credits

Dedication