Skip to content

Memory leak across rescans: scan's batch_cxt is never reset, accumulates parameter allocations for the whole query #273

@iskakaushik

Description

@iskakaushik

Summary

In a plan where a foreign scan is rescanned repeatedly — typically a nested-loop join with a parameterized inner foreign scan — every rescan leaks the remote-parameter conversion allocations into a memory context that is never reset until the query finishes. Memory growth is proportional to the number of rescans within a single query and is only released at executor shutdown.

All references below are pinned to commit 94256f0.

Code walkthrough

  1. ReScanForeignScan is aliased to the end-scan callback (src/fdw.c:3299-3300):
    routine->ReScanForeignScan = clickhouseEndForeignScan;
    routine->EndForeignScan = clickhouseEndForeignScan;
  2. clickhouseEndForeignScan (src/fdw.c:1174-1185) deletes only the cursor's own context and nulls the cursor:
    if (fsstate && fsstate->ch_cursor)
    {
        MemoryContextDelete(fsstate->ch_cursor->memcxt);
        fsstate->ch_cursor = NULL;
    }
  3. fsstate->batch_cxt is created in clickhouseBeginForeignScan as a child of estate->es_query_cxt, i.e. query lifetime (src/fdw.c:969-971).
  4. On the next clickhouseIterateForeignScan after a rescan, ch_cursor == NULL, so the scan re-issues the remote query inside batch_cxt (src/fdw.c:1116-1152). While batch_cxt is current, process_query_params (src/fdw.c:1646-1675) evaluates each parameter expression and converts it via chfdw_datum_to_ch_literal (src/pglink.c:664-729) — psprintf for numerics; for text-like types an OidOutputFunctionCall result (never freed) plus ch_escape_string's palloc(len * 2 + 1) (src/pglink.c:1523-1530); for arrays a makeStringInfo buffer (src/deparse.c:2226-2235).
  5. batch_cxt is never reset or deleted anywhere: the only references in the tree are its declaration (src/fdw.c:134), creation (src/fdw.c:969), and the switch in IterateForeignScan (src/fdw.c:1116). There is no MemoryContextReset(fsstate->batch_cxt) in the repository.

Notably, the comment at src/fdw.c:1120-1124 says the conversions are done "in the short-lived per-tuple context, so as not to cause a memory leak over repeated scans" — but the code is actually running in batch_cxt at that point; the switch to econtext->ecxt_per_tuple_memory that postgres_fdw performs around its equivalent of process_query_params is absent. postgres_fdw also resets its batch_cxt on every fetch; pg_clickhouse never does.

Why it accumulates

The cursor and its response data are correctly freed per rescan via ch_cursor->memcxt (a separate context under PortalContext, deleted in clickhouseEndForeignScan). The parameter literal strings, however, live in batch_cxt, whose parent is es_query_cxt. Each rescan overwrites the param_values pointers with freshly palloc'd strings, orphaning the previous ones inside batch_cxt. Since batch_cxt is reset/deleted nowhere, every rescan's allocations persist until the executor tears down es_query_cxt at the end of the query.

Observable impact

Backend memory grows monotonically during query execution, proportional to (number of rescans) × (per-parameter literal size) — roughly tens of bytes per integer parameter up to kilobytes per text/array parameter, per rescan. With nested-loop plans driving millions of rescans of a parameterized inner foreign scan, this reaches hundreds of MB to multiple GB within one query; in production this was observed as part of a multi-GB committed-memory runaway under join-heavy workloads. Memory is returned only when the query completes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions