-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using neutrino gun events to probe memory growth #46598
Comments
cms-bot internal usage |
A new Issue was created by @Dr15Jones. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction, dqm |
New categories assigned: reconstruction,dqm @antoniovagnerini,@jfernan2,@mandrenguyen,@rseidita you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Thread scaling in the job is hampered by the following
NOTE: the NANOEDMAODSIMoutput and SKIMStreamLogError had not yet been replaced with AsciiOutputModule at the time I made the measurement. |
My intent is to write a |
From the PeriodicAllocMonitor information, we do 1.1 million allocation requests each event. Some of that will be from copying the cached input data but the rest is just churn. Revised: Then if I did ask for each data product from each event it took 265 seconds to process 100k events (380 events/s) and it averaged 3400 allocations per event. I ran the PeriodicAlloMonitor for both of those jobs (which does slow it down some) and there is no sign of any memory increase. |
See #46603 |
So I used a script to generate a configuration that matches the step3 for module and path dependencies but with each producer replaced with the trivial Running over 70,000 events using 1 thread I see that the job was doing 9,000 allocations per event with an average amounted allocated per event of 270kB (we had 2000 modules which each module doing at least 2 Running over 140,000 events using 8 threads gave the same allocation and memory per event and a live allocation of 83.3MB. So the framework only accounts for 1% of the number of allocations being done per event. |
I used the fixed ModuleAllocMonitor service to look at the job for modules that do lots of allocations during event processing. Here are the largest
cmssw/DataFormats/PatCandidates/src/TriggerObjectStandAlone.cc Lines 389 to 394 in aca061d
and the reason only 6 of 8 are affected is those 6 streams hit that code section at about the same time so they are all trying to fill in that thread safe global.
All subsequent modules allocate less than 50k times each event. |
This one already has an open issue #42526
This one has an open issue as well #46498 |
See #46628 which decreases |
for my own education, can you explain what is the mechanics behind obtaining these numbers? cmsrel CMSSW_14_2_X_2024-11-07-2300
cd CMSSW_14_2_X_2024-11-07-2300/src/
cmsenv
runTheMatrix.py -l 12861.0 --maxSteps=2
cd 12861.0_NuGun+2024
cmsDriver.py step3 -s RAW2DIGI,L1Reco,RECO,RECOSIM,SKIM:LogError+LogErrorMonitor,PAT,NANO,DQM:@standardDQM+@ExtraHLT+@miniAODDQM+@nanoAODDQM --conditions auto:phase1_2024_realistic --datatier RECO,MINIAODSIM,NANOAODSIM,DQMIO -n 10 --eventcontent RECOSIM,MINIAODSIM,NANOEDMAODSIM,DQM --geometry DB:Extended --era Run3_2024 --customise Validation/Performance/TimeMemorySummary.customiseWithTimeMemorySummary --filein file:step2.root --fileout file:step3.root --no_exec
echo 'process.add_(cms.Service("ModuleAllocMonitor", fileName = cms.untracked.string("moduleAlloc.log")))' >> step3_RAW2DIGI_L1Reco_RECO_RECOSIM_SKIM_PAT_NANO_DQM.py
echo 'process.ModuleAllocMonitor.moduleNames = cms.untracked.vstring(["HLTSiStripMonitorTrack"])' >> step3_RAW2DIGI_L1Reco_RECO_RECOSIM_SKIM_PAT_NANO_DQM.py
setenv LD_PRELOAD "libPerfToolsAllocMonitorPreload.so libPerfToolsMaxMemoryPreload.so"
cmsRun step3_RAW2DIGI_L1Reco_RECO_RECOSIM_SKIM_PAT_NANO_DQM.py
wget https://raw.githubusercontent.com/cms-sw/cmssw/refs/heads/master/PerfTools/AllocMonitor/scripts/edmModuleAllocMonitorAnalyze.py .
python3 edmModuleAllocMonitorAnalyze.py moduleAlloc.log (notice that That gives me:
how should I read these numbers? |
@mmusich you actually load two different monitors, one a Service ( The data obtained by the |
The 85k from |
I started from workflow 12861.0 (which is a neutrino gun) and then modified step3 (reconstruction) by
PoolSource
withRepeatingCachedRootSource
. This allows the job to run an infinite number of events without any memory growth from the source (as all events are already cached in memory)PoolOutputModule
withAsciiOutputModule
both using identicaloutputCommands
andverbosity
turned off. This avoids problems with the output file sizings growing too large and makes the jobs run faster.SimpleMemoryCheck
with periodic (event 10 second) sampling on and/or ServicePeriodicAllocMonitor
(sampling every 10 seconds).The full step3 cmsDriver.py command I started from was
The text was updated successfully, but these errors were encountered: