Open
Conversation
… for background work items
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #412 +/- ##
==========================================
- Coverage 29.07% 28.82% -0.26%
==========================================
Files 194 196 +2
Lines 40229 40667 +438
Branches 14464 14650 +186
==========================================
+ Hits 11697 11721 +24
- Misses 27723 28013 +290
- Partials 809 933 +124 ☔ View full report in Codecov by Sentry. |
…e command line is parsed
…non-background work thread
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds a binary profiling system for Realm's background work items, producing .bin files in the RBWP (Realm Background Work Profile) format that can be loaded by Legion Prof alongside normal profiling logs.
Motivation
Background worker threads perform significant work (DMA transfers, active message handling, GPU event reaping, dependent partitioning) that was previously invisible to profilers. This makes it difficult to diagnose performance issues caused by background work contention or to understand how background work interacts with application-level tasks.
Profiling levels
operations (individual AM handler invocations, per-XferDes progress_xd() calls, dependent partitioning operations, GPU event reap operations).
Command-line flags:
-ll:bgworkprofile Profiling level (0=off, 1=coarse, 2=fine) Default: 0
-ll:bgworkprofile_logfile Output file path (% replaced with node ID) Default: bgwork_profile_%.bin
-ll:bgworkprofile_bufsize Max in-memory buffer before streaming to disk (0=unlimited) Default: 1024
Binary file format (RBWP)
A compact binary format with:
The streaming design allows bounded memory usage for long-running applications. Data blocks are buffered in memory and flushed to disk when the buffer exceeds the configured size. Descriptor tables are written at shutdown to ensure all dynamically-registered items are included.
GPU kernel timing
Background work items that launch GPU compute kernels (batch affine copies, transpose kernels, fill kernels, reduction kernels) are instrumented to capture host-side timing of when those kernels execute on the GPU. This uses GPUCompletionNotification callbacks on existing (untimed) CUDA/HIP events — a start notification is placed on the stream before kernel submission and an end notification after, with host-side timestamps recorded when each event is reaped. The resulting GPU_WORK records carry the GPU processor ID so profilers can place them on the correct GPU device timeline.
Only compute kernel launches are instrumented, not cuMemcpyAsync/hipMemcpyAsync calls, since those run on copy engine hardware and are already represented by copy channels in the profiler.
Implementation details
New files:
Modified files:
GPUfillXferDes::progress_xd(), GPUreduceXferDes::progress_xd()
Design decisions
Usage
./my_app -ll:bgworkprofile 2 -ll:bgworkprofile_logfile bgwork_%.bin -ll:bgworkprofile_bufsize 512
The % is replaced with the node ID for multi-node runs. The resulting .bin files are passed to Legion Prof alongside normal log files
— format detection is based on the RBWP header magic, not filenames.