-
Notifications
You must be signed in to change notification settings - Fork 184
Description
At my company we use pg_repack primarily as a tool to delete obsolete historical data or for compliance reasons (e.g. GDPR). Running repack kills two birds with one stone: remove bloat but also remove old data where a long-running DELETE is impractical due to the maintenance window it would entail due to full table locks.
Our current implementation requires no C level changes to the extension, just patching the PL/PGSQL code, using the migration below. Essentially we add a table repack.filters
that has a WHERE clause that specified what rows to keep, and optionall a repack.joins
table with JOIN clauses if we need information from other tables to make the decision. If there is no matching row in those tables, the existing pg_repack behavior (copy everything) holds.
I think this is a fairly valuable feature and use case for pg_repack, and if there is interest, I'd be happy to package it as a PR for inclusion upstream.
CREATE TABLE IF NOT EXISTS repack.filters (
tablename TEXT PRIMARY KEY,
clause TEXT
);
CREATE TABLE IF NOT EXISTS repack.joins (
tablename TEXT PRIMARY KEY,
joins TEXT NOT NULL
);
DO $$
DECLARE
_v TEXT;
_vn TEXT;
BEGIN
UPDATE pg_proc
SET prosrc = REPLACE(prosrc, 'quote_ident', '''T.''||quote_ident')
WHERE proname='get_columns_for_create_as'
AND pronamespace=(SELECT oid FROM pg_namespace WHERE nspname='repack');
SELECT definition INTO _v
FROM pg_views WHERE schemaname='repack' AND viewname='tables';
IF _v ~ 'repack.filters' THEN
RAISE EXCEPTION 'repack filters patch already applied';
END IF;
EXECUTE FORMAT('CREATE OR REPLACE VIEW repack.tables AS %s', _v);
_vn := REPLACE(_v, 'LEFT JOIN pg_class t ON ((r.reltoastrelid = t.oid))',
'LEFT JOIN pg_class T ON R.reltoastrelid = T.oid
LEFT JOIN repack.filters F ON R.relname=F.tablename
LEFT JOIN repack.joins J ON R.relname=J.tablename');
IF _vn = _v THEN
RAISE EXCEPTION 'repack.table view definition changed, cannot patch 1';
END IF;
_v := _vn;
_vn := REPLACE(_vn,
' AS copy_data',
' || '' T'' || COALESCE(J.joins, '' '') || COALESCE('' WHERE '' || F.clause || '' '', '''') AS copy_data');
IF _vn = _v THEN
RAISE EXCEPTION 'repack.table view definition changed, cannot patch 2';
END IF;
_v := _vn;
_vn := REPLACE(_vn,
' AS create_table_2',
' || '' T'' AS create_table_2');
IF _vn = _v THEN
RAISE EXCEPTION 'repack.table view definition changed, cannot patch 3';
END IF;
EXECUTE FORMAT('CREATE OR REPLACE VIEW repack.tables AS %s', _vn);
END;
$$;
So for instance if you have the table definitions:
CREATE TABLE contract(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
expiration DATE NULL
);
CREATE TABLE employee(
id SERIAL PRIMARY KEY
name TEXT NOT NULL
contract INT NOT NULL REFERENCES contract(id)
);
And you want to repack out employees with expired contracts, you would:
INSERT INTO repack.filters (tablename, clause)
VALUES ('employee', 'COALESCE(contract.expiration, CURRENT_DATE) >= CURRENT_DATE');
INSERT INTO repack.joins (tablename, joins)
VALUES ('employee', 'JOIN contract ON employee.contract=contract.id');