Fix read/reset for topdown component #504

daboehme · 2025-11-04T23:24:59Z

Pull Request Description

I found that the topdown component does not handle PAPI_reset and PAPI_read correctly. This PR fixes that.

I'm developing Caliper, an instrumentation-based profiler. We call PAPI_read and PAPI_reset at each instrumentation point so that we can properly associate events with the instrumented code region. I noticed that the reset did not appear to work for the topdown events, and looking at the code I found that it was indeed a no-op. I managed to restructure the code so that reset works now, but I'm no expert so please check if I missed anything. Here's what I did:

Before, _topdown_start resetted the counters and _topdown_reset was a no-op. Also, _topdown_read would just call _topdown_stop, which in turn would close the file descriptors necessary to reset the counters.
Now, the counter reading code is in _topdown_read, and _topdown_stop calls _topdown_read (and closes the FD's).
The reset code is now in _topdown_reset, and _topdown_start calls _topdown_reset.

That does seem to do the trick for me. Here's example output with a code that alternates STREAM triad and matrix multiply in a loop, which have wildly different topdown stats. Before, the events would not reset and the topdown stats just converge to an overall program average:

iter kernel     retiring  bad_spec fe_bound  be_bound
   0 triad      21.176500 3.137260 22.352900 53.725500
   0 matmul_f64 76.862700 5.882350 12.156900  5.490200
   1 triad      70.588200 5.490200 11.372500 12.941200
   1 matmul_f64 75.686300 5.490200 11.372500  7.450980
   2 triad      72.549000 5.098040 11.372500 10.980400
   2 matmul_f64 75.294100 5.490200 11.372500  8.235290
   3 triad      73.333300 5.098040 11.372500 10.588200
   3 matmul_f64 75.294100 5.098040 11.372500  8.627450
   4 triad      73.725500 5.098040 11.372500 10.196100
   4 matmul_f64 75.294100 5.098040 11.372500  8.627450
   5 triad      73.725500 5.098040 11.372500 10.196100
   5 matmul_f64 74.902000 5.098040 11.372500  8.627450
   6 triad      74.117600 5.098040 11.372500  9.803920
   6 matmul_f64 74.902000 5.098040 11.372500  9.019610
   7 triad      74.117600 5.098040 11.372500  9.803920
   7 matmul_f64 74.902000 5.098040 11.372500  9.019610

With the changes, we now get distinct and reasonable topdown stats for each kernel:

iter kernel     retiring  bad_spec fe_bound  be_bound
   0 triad      20.784300 3.137260 21.960800 54.117600
   0 matmul_f64 81.568600 5.882350 11.764700  1.176470
   1 triad      16.470600 0.000000  9.411760 74.117600
   1 matmul_f64 81.960800 5.490200 11.764700  1.176470
   2 triad      17.254900 0.000000  9.803920 72.941200
   2 matmul_f64 81.960800 5.490200 11.764700  1.176470
   3 triad      17.647100 0.000000 10.196100 72.156900
   3 matmul_f64 81.960800 5.490200 11.764700  1.176470
   4 triad      17.647100 0.000000 11.372500 70.980400
   4 matmul_f64 81.568600 5.490200 12.156900  1.176470
   5 triad      17.647100 0.000000  9.411760 72.941200
   5 matmul_f64 81.568600 5.098040 12.156900  1.176470
   6 triad      17.647100 0.000000 10.196100 72.156900
   6 matmul_f64 81.960800 5.490200 11.764700  1.176470
   7 triad      17.647100 0.000000  9.803920 72.941200
   7 matmul_f64 81.960800 5.490200 11.764700  1.176470

We really want to have individual topdown values for each instrumented code region, so it would be great to get the reset fixed. We're aware that the topdown values can be unreliable for very short regions but that's something we can live with.

I'd appreciate it if we can get this in.

Author Checklist

Description
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits
Commits are self contained and only do one thing
Commits have a header of the form: module: short description
Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
Tests
The PR needs to pass all the tests

Treece-Burgess · 2025-11-06T19:25:28Z

Hello @daboehme, thank you for submitting this PR. I will take a look through it.

Treece-Burgess · 2025-11-10T22:03:28Z

Hello @daboehme, as an update I have looked through your PR. One issue you mentioned is that _topdown_read calls _topdown_stop which closes the file descriptors. This is true and your PR resolved this behavior.

However, in the case of the following workflow:

// First start - stop workflow
PAPI_start()
//some work done
PAPI_stop()

// Second start - stop workflow
PAPI_start()
//some work done
PAPI_stop()

You would still run into the file descriptors being closed in the second start - stop workflow and the call to ioctl in _topdown_reset would fail if the error code was checked.

I am currently working on a fix for this along with a few other changes for the component. I am going to leave this PR up until I have completed the changes necessary and submit a new PR. I will also give you credit in the commit history.

Fix read/reset for topdown component

8a60759

dbarry9 assigned dbarry9 and unassigned dbarry9 Nov 4, 2025

Treece-Burgess mentioned this pull request Nov 11, 2025

topdown: Restructure code for error checking with sys call, make _topdown_reset actually reset, and avoid closing file descriptors when calling PAPI_read #509

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix read/reset for topdown component #504

Fix read/reset for topdown component #504

Uh oh!

daboehme commented Nov 4, 2025 •

edited

Loading

Uh oh!

Treece-Burgess commented Nov 6, 2025

Uh oh!

Treece-Burgess commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix read/reset for topdown component #504

Are you sure you want to change the base?

Fix read/reset for topdown component #504

Uh oh!

Conversation

daboehme commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Author Checklist

Uh oh!

Treece-Burgess commented Nov 6, 2025

Uh oh!

Treece-Burgess commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

daboehme commented Nov 4, 2025 •

edited

Loading