-
Notifications
You must be signed in to change notification settings - Fork 50
Output Amount of Memory Touched #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
3af24be
95fbe35
d3e1328
2027c3d
64285ef
15283e1
dc8ca93
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,21 +80,33 @@ Information reported in the file for each kernel is: | |
| * **Kernels/rep** -- total number of loop structures run (or GPU kernels | ||
| launched) in each kernel repetition. | ||
| * **Bytes/rep** -- Total number of bytes read from and written to memory for | ||
| each repetition of kernel. | ||
| each repetition of kernel. This is a best case scenario of the total traffic | ||
| to and from memory assuming perfect cache reuse and ignoring partial usage | ||
| of data in some memory transactions. | ||
| * **FLOPs/rep** -- Total number of floating point operations executed for | ||
| each repetition of kernel. Currently, we count arithmetic operations | ||
| (+, -, *, /) and functions, such as exp, sin, etc. as one FLOP. We do not | ||
| currently count operations like abs and comparisons (<, >, etc.) in the | ||
| FLOP count. So these numbers are rough estimates. For actual FLOP counts, | ||
| a performance analysis tool should be used. | ||
| * **BytesTouched/rep** -- Total number of bytes accessed for each repetition | ||
| of kernel. This is a best case scenario for the amount of cache needed to | ||
| fit all of the data used by the kernel ignoring partial usage of some cache | ||
| lines. | ||
| * **BytesRead/rep** -- Total number of bytes read from memory for | ||
| each repetition of kernel. | ||
| * **BytesWritten/rep** -- Total number of bytes written to memory for | ||
| each repetition of kernel. | ||
| * **BytesModifyWritten/rep** -- Total number of bytes modified for each | ||
| repetition of kernel. The intersection of bytes in both ``BytesRead/rep`` | ||
| and ``BytesWritten/rep``. | ||
| * **BytesAtomicModifyWritten/rep** -- Total number of bytes modified by | ||
| atomic operations in a kernel. If a kernel contains no atomic operations, | ||
| the value of zero is reported. | ||
|
|
||
| ..note:: The Bytes*/rep and FLOPs/rep counts are estimates for kernels | ||
| involving randomness or difficult to count algorithms. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which kernels exhibit randomness? It may be nice to list the kernels which do not have the best estimates
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @artv3 I think the plan is to have that sort of information appear in the output files. Putting kernel-specific information in the user documentation creates extra maintenance burden.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is not currently an attribute for that, but it can be added fairly easily. |
||
|
|
||
| .. _output_probsize-label: | ||
|
|
||
| ============================ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte in DRAM or HBM right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each byte is counted once even if its accessed multiple times or in multiple ways. This doesn't count things we expect to be on the stack however like what is captured in the lambda.