Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workflows to monitor in PR profiling #2282

Merged
merged 2 commits into from
Jul 16, 2024
Merged

Conversation

jfernan2
Copy link
Contributor

@jfernan2 jfernan2 commented Jul 4, 2024

Changed to monitor wf 29834.21 (D110 upgrade) and 12634.21 (Run3 2023)

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 4, 2024

A new Pull Request was created by @jfernan2 for branch master.

@aandvalenzuela, @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 4, 2024

cms-bot internal usage

@srimanob
Copy link
Contributor

srimanob commented Jul 4, 2024

Hi @jfernan2
Should you use 29834.21 instead of 29634.21 ? It is PU workflow.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 4, 2024

Pull request #2282 was updated.

@jfernan2
Copy link
Contributor Author

jfernan2 commented Jul 4, 2024

Correct @srimanob I have fixed it
Thanks!

@smuzaffar
Copy link
Contributor

enable profiling

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 4, 2024

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7a973e/40235/summary.html
COMMIT: 743195f
CMSSW: CMSSW_14_1_X_2024-07-04-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cms-bot/2282/40235/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 2 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3345088
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3345065
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 202 log files, 165 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

@jfernan2 , looks like 12634.21 is not a valid workflow (at least it is not in 14.1.X) or do we need to pass any extra option to runTheMatrix.py to run it?

@srimanob
Copy link
Contributor

srimanob commented Jul 4, 2024

@jfernan2 , looks like 12634.21 is not a valid workflow (at least it is not in 14.1.X) or do we need to pass any extra option to runTheMatrix.py to run it?

Hi @smuzaffar
It will need relvals_opt = --what upgrade as the workflow has not defined in https://github.com/cms-sw/cmssw/blob/master/Configuration/PyReleaseValidation/python/relval_2017.py yet.

@srimanob
Copy link
Contributor

srimanob commented Jul 4, 2024

I create the following PR since the workflow should be defined in relval_2017,
cms-sw/cmssw#45381

@smuzaffar
Copy link
Contributor

So we need to provide a way to instruct bot how to run workflows which are not active by default for runTheMatrix. We can add a variable e.g PROFILING_OPTS="-w upgrade,standard" in https://github.com/cms-sw/cms-bot/blob/master/cmssw-pr-test-config so that bot can use it when running runTheMatrix

@srimanob
Copy link
Contributor

srimanob commented Jul 4, 2024

So we need to provide a way to instruct bot how to run workflows which are not active by default for runTheMatrix. We can add a variable e.g PROFILING_OPTS="-w upgrade,standard" in https://github.com/cms-sw/cms-bot/blob/master/cmssw-pr-test-config so that bot can use it when running runTheMatrix

That would be useful so that we can handle upgrade workflow. Thanks @smuzaffar
Since the workflows we need should be in relval_2017 anyways, so maybe we can merge cms-sw/cmssw#45381 (after PR tests) then follows by this PR.

@jfernan2
Copy link
Contributor Author

jfernan2 commented Jul 16, 2024

dear @srimanob and @smuzaffar
Now that cms-sw/cmssw#45381 has been merged, could we revive this PR? Thanks

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

+externals
looks good

@smuzaffar smuzaffar merged commit 8a2245e into cms-sw:master Jul 16, 2024
11 of 12 checks passed
@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7a973e/40434/summary.html
COMMIT: 743195f
CMSSW: CMSSW_14_1_X_2024-07-16-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cms-bot/2282/40434/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3345094
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3345071
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 202 log files, 165 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor Author

@smuzaffar after the inclusion of this PR, I thought we could have profiling comparison in PR tests, but it looks like something else is missing, see for example last trial:
cms-sw/cmssw#45333
Profiling results for 12634.21 and 29834.21 are there but comparison is empty:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/summary.html
I fail to see the reason why since the logs shows all OK, apparently:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/testsResults/profiling.txt
Do you have any hint?
Thanks

@smuzaffar
Copy link
Contributor

@jfernan2 , though PR and baseline jobs were run for these workflows but I think we need DQM*.root files for comparison and these works 12634.21 and 29834.21 do not generate such output that is why comparison was not run

@smuzaffar
Copy link
Contributor

may be @gartung knows why profiling comparison is empty

@makortel
Copy link
Contributor

may be @gartung knows why profiling comparison is empty

@gartung is on vacation until August 9.

@gartung
Copy link
Member

gartung commented Jul 23, 2024

@gartung
Copy link
Member

gartung commented Jul 23, 2024

@jfernan2
Copy link
Contributor Author

Thanks @gartung
I am trying a last time since it worked in #1912

@jfernan2
Copy link
Contributor Author

jfernan2 commented Jul 24, 2024

@gartung the same two workflows 29834.21 and 12634.21 run fine with igprof in Jenkins profiling[1]

I believe the difference is on how igprof is called with -j JobReport

igprof -pp -d -t cmsRun -z -o ./igprofCPU_step3.gz -- cmsRun step3_igprof.py -j step3_igprof_cpu_JobReport.xml >& step3_igprof_cpu.log -> crashes

igprof -d -pp -z -o step3_igprofCPU.gz -t cmsRun cmsRun step3_igprof.py -> runs fine

Somehow it was removed for igprof here[2], since those xml files seem to not be used anywhere, hence I am proposing the following PR if you agree:

cms-cmpwg/profiling#8

[1] https://cmssdt.cern.ch/jenkins/job/release-run-reco-profiling/533/console
https://cmssdt.cern.ch/jenkins/job/release-run-reco-profiling/538/console
[2]

# strip the JobReport from the step command

@makortel
Copy link
Contributor

I believe the difference is on how igprof is called with -j JobReport

I'd find it very strange if the framework job report would be causing segfaults under IgProf, but who knows.

@jfernan2
Copy link
Contributor Author

Me too @makortel but I have just repeated the test in Jenkins with success for both workflows using igprof and no JobReport output....

@jfernan2
Copy link
Contributor Author

jfernan2 commented Sep 2, 2024

@smuzaffar igprof is still giving problems, however the baseline seems to not be running igprof for the two wfs in question (12634.21 and 29834.21), so we miss the reference anyway, see fopr example this recent trial:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/ib-baseline-tests/CMSSW_14_2_X_2024-09-01-2300/el8_amd64_gcc12/-GenuineIntel/matrix-results/

Any idea? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants