-
Notifications
You must be signed in to change notification settings - Fork 50
Output Amount of Memory Touched #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Now the memory accesses are split into read, write, modify write, and atomic modify write categories. Each memory address touched should only appear in exactly one of these memory attributes per loop-next/kernel launch. Continue to output Bytes which is the total memory traffic. This is calculated via (read + write + 2*(modify write + atomic modify write)). Also output BytesTouched which is the amount of memory used. This is calculated as the sum of the 4 categories of memory accesses. These numbers are idealized and real memory traffic may be higher if perfect caching is not achieved. They are also sometimes estimates as some kernels have conditionals that rely on random numbers or complex implementations.
Add the number of bytes modify written per kernel. Label which variable(s) are accessed for each line of the bytes read, written, modify written, or atomic modify written.
rhornung67
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some text needs to be made clearer, maybe some details added, I think.
Undo separate of read modify write count from the read count and write count. So the BytesRead/rep count includes both those "read only" and "read modified and written".
| FLOP count. So these numbers are rough estimates. For actual FLOP counts, | ||
| a performance analysis tool should be used. | ||
| * **BytesTouched/rep** -- Total number of bytes accessed for each repetition | ||
| of kernel. This is a best case scenario for the amount of cache needed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte in DRAM or HBM right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each byte is counted once even if its accessed multiple times or in multiple ways. This doesn't count things we expect to be on the stack however like what is captured in the lambda.
| the value of zero is reported. | ||
|
|
||
| ..note:: The Bytes*/rep and FLOPs/rep counts are estimates for kernels | ||
| involving randomness or difficult to count algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which kernels exhibit randomness? It may be nice to list the kernels which do not have the best estimates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artv3 I think the plan is to have that sort of information appear in the output files. Putting kernel-specific information in the user documentation creates extra maintenance burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not currently an attribute for that, but it can be added fairly easily.
Summary
Add a count for the amount of memory touched. This is useful as a estimate of the memory used by a kernel and a best case for the amount of cache used by a kernel.
This is implemented by counting the bytes modify written separately from bytes read and bytes written.