Sidekiq worker processes exhibit unbounded memory growth in production. RSS increases monotonically across sync cycles and does not recover between jobs, eventually triggering OOM kills.
Instrumentation Added
Two pieces of instrumentation were added to aid reproduction and diagnosis. These changes should be kept for ongoing monitoring but the flamegraph feature should remain opt-in in production.
1. lib/sidekiq/memory_profiling_middleware.rb (new file)
A Sidekiq server middleware that logs GC heap stats and object type deltas after every job:
[MemoryProfiling] job=Sidekiq::ActiveJob::Wrapper queue=high_priority
heap_live_delta=488847 objects_allocated=774155 gc_runs=1
T_OBJECT=+111069 T_STRING=+124374 T_ARRAY=+52819 T_HASH=+37179 T_DATA=+95859
Registered in config/initializers/sidekiq.rb.
2. Process-wide StackProf flamegraph export (in config/initializers/sidekiq.rb)
When SIDEKIQ_FLAMEGRAPH=1 is set, starts a stackprof object-allocation profile on worker boot and writes a .dump file to SIDEKIQ_FLAMEGRAPH_DIR (default /tmp) on graceful shutdown:
[MemoryProfiling] Process profile written: /tmp/stackprof_process_88444_1234567890.dump
Convert with:
bundle exec stackprof --json /tmp/stackprof_process_*.dump > flame.json
# upload flame.json to speedscope.app → Sandwich view, sort by Self
stackprof gem moved out of the development group in Gemfile so it is available in production.
Observed Behavior
Per-job middleware shows each SyncJob retaining ~488k objects after GC:
heap_live_delta=488847 objects_allocated=774155 gc_runs=1
T_OBJECT=+111069 T_STRING=+124374 T_ARRAY=+52819 T_HASH=+37179 T_DATA=+95859
With Sidekiq concurrency of 3, a single sync cycle retains ~1.4M objects. Workers are killed by OOM before completing their queue.
Investigation
Object allocation flamegraph (Sandwich view, sorted by Self) identified:
| Method |
Self allocations |
% |
Class#new |
63,163 |
12% |
Kernel#dup |
47,289 |
8.7% |
Kernel#BigDecimal |
42,569 |
7.8% |
BigDecimal#round |
31,273 |
5.7% |
PG::Result#each |
21,592 |
4.0% |
ActiveModel::AttributeAssignment#_assign_attribute |
21,420 |
3.9% |
ActiveRecord::AttributeAssignment#_assign_attributes |
21,405 |
3.9% |
Balance::Materializer#persist_balances |
21,202 |
3.9% |
Balance::SyncCache#get_entries |
3,199 |
0.6% |
Date#upto accounts for 45% of total allocations as the call tree root.
Suspected Root Cause
Balance::Materializer#persist_balances and Holding::Materializer#persist_holdings instantiate full AR models (Balance.new(...), Holding.new(...)) for every date in the calculation range, then immediately call .attributes.slice(...) to serialize them for upsert_all — incurring Class#new, Kernel#dup, and the full AR attribute machinery per row with no benefit.
Balance::SyncCache#converted_entries calls e.dup on every Entry AR model for FX conversion (secondary issue).
Files to Investigate
- app/models/balance/materializer.rb — persist_balances
- app/models/balance/base_calculator.rb — build_balance
- app/models/holding/materializer.rb — persist_holdings
- app/models/holding/forward_calculator.rb / reverse_calculator.rb — build_holdings
- app/models/balance/sync_cache.rb — converted_entries
Steps to Reproduce
SIDEKIQ_FLAMEGRAPH=1 SIDEKIQ_FLAMEGRAPH_INTERVAL=50 bin/dev
# trigger a sync, then Ctrl+C for graceful shutdown
bundle exec stackprof --json /tmp/stackprof_process_*.dump > flame.json
# upload flame.json to speedscope.app → Sandwich view, sort by Self
Sidekiq worker processes exhibit unbounded memory growth in production. RSS increases monotonically across sync cycles and does not recover between jobs, eventually triggering OOM kills.
Instrumentation Added
Two pieces of instrumentation were added to aid reproduction and diagnosis. These changes should be kept for ongoing monitoring but the flamegraph feature should remain opt-in in production.
1. lib/sidekiq/memory_profiling_middleware.rb (new file)
A Sidekiq server middleware that logs GC heap stats and object type deltas after every job:
Registered in
config/initializers/sidekiq.rb.2. Process-wide StackProf flamegraph export (in
config/initializers/sidekiq.rb)When
SIDEKIQ_FLAMEGRAPH=1is set, starts astackprofobject-allocation profile on worker boot and writes a.dumpfile toSIDEKIQ_FLAMEGRAPH_DIR(default /tmp) on graceful shutdown:Convert with:
stackprofgem moved out of thedevelopmentgroup inGemfileso it is available in production.Observed Behavior
Per-job middleware shows each
SyncJobretaining ~488k objects after GC:With Sidekiq concurrency of 3, a single sync cycle retains ~1.4M objects. Workers are killed by OOM before completing their queue.
Investigation
Object allocation flamegraph (Sandwich view, sorted by Self) identified:
Class#newKernel#dupKernel#BigDecimalBigDecimal#roundPG::Result#eachActiveModel::AttributeAssignment#_assign_attributeActiveRecord::AttributeAssignment#_assign_attributesBalance::Materializer#persist_balancesBalance::SyncCache#get_entriesDate#uptoaccounts for 45% of total allocations as the call tree root.Suspected Root Cause
Balance::Materializer#persist_balancesandHolding::Materializer#persist_holdingsinstantiate full AR models (Balance.new(...),Holding.new(...)) for every date in the calculation range, then immediately call.attributes.slice(...)to serialize them forupsert_all— incurringClass#new,Kernel#dup, and the full AR attribute machinery per row with no benefit.Balance::SyncCache#converted_entriescallse.dupon everyEntryAR model for FX conversion (secondary issue).Files to Investigate
Steps to Reproduce