@@ -101,15 +101,39 @@ Periodically, a polling mechanism processes this deferred-free list:
101101
102102To reduce memory contention from frequent updates to the global ` wr_seq ` , its
103103advancement is sometimes deferred. Instead of incrementing ` wr_seq ` on every
104- reclamation request, each thread tracks its number of deferrals locally. Once
105- the deferral count reaches a limit (QSBR_DEFERRED_LIMIT, currently 10), the
106- thread advances the global ` wr_seq ` and resets its local count.
107-
108- When an object is added to the deferred-free list, its qsbr_goal is set to
109- ` wr_seq ` + 2. By setting the goal to the next sequence value, we ensure it's safe
110- to defer the global counter advancement. This optimization improves runtime
111- speed but may increase peak memory usage by slightly delaying when memory can
112- be reclaimed.
104+ reclamation request, the object's qsbr_goal is set to ` wr_seq ` + 2 (the value
105+ the counter * would* take on its next advance) without actually advancing the
106+ global counter. This is safe because the goal still corresponds to a future
107+ sequence value that no thread has yet observed as quiescent.
108+
109+ Whether to actually advance ` wr_seq ` is decided per request, based on how
110+ much memory and how many items the calling thread has already deferred since
111+ its last advance:
112+
113+ * For deferred object frees (` _PyMem_FreeDelayed ` ), the thread tracks both a
114+ count (` deferred_count ` ) and an estimate of the held memory
115+ (` deferred_memory ` ). The global ` wr_seq ` is advanced when the freed block
116+ is larger than ` QSBR_FREE_MEM_LIMIT ` (1 MiB), when the accumulated deferred
117+ memory exceeds that limit, or when the count exceeds ` QSBR_DEFERRED_LIMIT `
118+ (127, sized so a chunk of work items is processed before it overflows).
119+ Crossing any of these thresholds also sets a per-thread ` should_process `
120+ flag, signalling that the deferred-free list should be drained.
121+
122+ * For mimalloc pages held by QSBR, the thread tracks ` deferred_page_memory `
123+ and advances ` wr_seq ` when either the individual page or the accumulated
124+ page memory exceeds ` QSBR_PAGE_MEM_LIMIT ` (4096 * 20 bytes). Advancing
125+ promptly here matters because a held page cannot be reused for a different
126+ size class or by a different thread.
127+
128+ Processing of the deferred-free list normally happens from the eval breaker
129+ (rather than from inside ` _PyMem_FreeDelayed ` ), which gives the global
130+ ` rd_seq ` a better chance to have advanced far enough that items can actually
131+ be freed. ` _PyMem_ProcessDelayed ` is still called from the free path as a
132+ safety valve when a work-item chunk fills up.
133+
134+ This optimization improves runtime speed but may increase peak memory usage
135+ by slightly delaying when memory can be reclaimed; the size-based thresholds
136+ above bound that extra memory.
113137
114138
115139## Limitations
0 commit comments