Skip to content

Using repack to prune old data #279

@fazalmajid

Description

@fazalmajid

At my company we use pg_repack primarily as a tool to delete obsolete historical data or for compliance reasons (e.g. GDPR). Running repack kills two birds with one stone: remove bloat but also remove old data where a long-running DELETE is impractical due to the maintenance window it would entail due to full table locks.

Our current implementation requires no C level changes to the extension, just patching the PL/PGSQL code, using the migration below. Essentially we add a table repack.filters that has a WHERE clause that specified what rows to keep, and optionall a repack.joins table with JOIN clauses if we need information from other tables to make the decision. If there is no matching row in those tables, the existing pg_repack behavior (copy everything) holds.

I think this is a fairly valuable feature and use case for pg_repack, and if there is interest, I'd be happy to package it as a PR for inclusion upstream.

CREATE TABLE IF NOT EXISTS repack.filters (
  tablename TEXT PRIMARY KEY,
  clause    TEXT
);

CREATE TABLE IF NOT EXISTS repack.joins (
  tablename TEXT PRIMARY KEY,
  joins     TEXT NOT NULL
);

DO $$
DECLARE
  _v TEXT;
  _vn TEXT;
BEGIN
  UPDATE pg_proc
  SET prosrc = REPLACE(prosrc, 'quote_ident', '''T.''||quote_ident')
  WHERE proname='get_columns_for_create_as'
    AND pronamespace=(SELECT oid FROM pg_namespace WHERE nspname='repack');
    
  SELECT definition INTO _v
  FROM pg_views WHERE schemaname='repack' AND viewname='tables';
  IF _v ~ 'repack.filters' THEN
    RAISE EXCEPTION 'repack filters patch already applied';
  END IF;
  EXECUTE FORMAT('CREATE OR REPLACE VIEW repack.tables AS %s', _v);
  _vn := REPLACE(_v, 'LEFT JOIN pg_class t ON ((r.reltoastrelid = t.oid))',
                 'LEFT JOIN pg_class T ON R.reltoastrelid = T.oid
     LEFT JOIN repack.filters F ON R.relname=F.tablename
     LEFT JOIN repack.joins J ON R.relname=J.tablename');
  IF _vn = _v THEN
    RAISE EXCEPTION 'repack.table view definition changed, cannot patch 1';
  END IF;
  _v := _vn;
  _vn := REPLACE(_vn,
    ' AS copy_data',
    ' || '' T'' || COALESCE(J.joins, '' '') || COALESCE('' WHERE '' || F.clause || '' '', '''') AS copy_data');
  IF _vn = _v THEN
    RAISE EXCEPTION 'repack.table view definition changed, cannot patch 2';
  END IF;
  _v := _vn;
  _vn := REPLACE(_vn,
    ' AS create_table_2',
    ' || '' T'' AS create_table_2');
  IF _vn = _v THEN
    RAISE EXCEPTION 'repack.table view definition changed, cannot patch 3';
  END IF;
  EXECUTE FORMAT('CREATE OR REPLACE VIEW repack.tables AS %s', _vn);
END;   
$$;

So for instance if you have the table definitions:

CREATE TABLE contract(
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  expiration DATE NULL
);
CREATE TABLE employee(
  id SERIAL PRIMARY KEY
  name TEXT NOT NULL
  contract INT NOT NULL REFERENCES contract(id)
);

And you want to repack out employees with expired contracts, you would:

INSERT INTO repack.filters (tablename, clause)
VALUES ('employee', 'COALESCE(contract.expiration, CURRENT_DATE) >= CURRENT_DATE');
INSERT INTO repack.joins (tablename, joins)
VALUES ('employee', 'JOIN contract ON employee.contract=contract.id');

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions