Skip to content

Conversation

@daboehme
Copy link

@daboehme daboehme commented Nov 4, 2025

Pull Request Description

I found that the topdown component does not handle PAPI_reset and PAPI_read correctly. This PR fixes that.

I'm developing Caliper, an instrumentation-based profiler. We call PAPI_read and PAPI_reset at each instrumentation point so that we can properly associate events with the instrumented code region. I noticed that the reset did not appear to work for the topdown events, and looking at the code I found that it was indeed a no-op. I managed to restructure the code so that reset works now, but I'm no expert so please check if I missed anything. Here's what I did:

  • Before, _topdown_start resetted the counters and _topdown_reset was a no-op. Also, _topdown_read would just call _topdown_stop, which in turn would close the file descriptors necessary to reset the counters.
  • Now, the counter reading code is in _topdown_read, and _topdown_stop calls _topdown_read (and closes the FD's).
  • The reset code is now in _topdown_reset, and _topdown_start calls _topdown_reset.

That does seem to do the trick for me. Here's example output with a code that alternates STREAM triad and matrix multiply in a loop, which have wildly different topdown stats. Before, the events would not reset and the topdown stats just converge to an overall program average:

iter kernel     retiring  bad_spec fe_bound  be_bound
   0 triad      21.176500 3.137260 22.352900 53.725500
   0 matmul_f64 76.862700 5.882350 12.156900  5.490200
   1 triad      70.588200 5.490200 11.372500 12.941200
   1 matmul_f64 75.686300 5.490200 11.372500  7.450980
   2 triad      72.549000 5.098040 11.372500 10.980400
   2 matmul_f64 75.294100 5.490200 11.372500  8.235290
   3 triad      73.333300 5.098040 11.372500 10.588200
   3 matmul_f64 75.294100 5.098040 11.372500  8.627450
   4 triad      73.725500 5.098040 11.372500 10.196100
   4 matmul_f64 75.294100 5.098040 11.372500  8.627450
   5 triad      73.725500 5.098040 11.372500 10.196100
   5 matmul_f64 74.902000 5.098040 11.372500  8.627450
   6 triad      74.117600 5.098040 11.372500  9.803920
   6 matmul_f64 74.902000 5.098040 11.372500  9.019610
   7 triad      74.117600 5.098040 11.372500  9.803920
   7 matmul_f64 74.902000 5.098040 11.372500  9.019610

With the changes, we now get distinct and reasonable topdown stats for each kernel:

iter kernel     retiring  bad_spec fe_bound  be_bound
   0 triad      20.784300 3.137260 21.960800 54.117600
   0 matmul_f64 81.568600 5.882350 11.764700  1.176470
   1 triad      16.470600 0.000000  9.411760 74.117600
   1 matmul_f64 81.960800 5.490200 11.764700  1.176470
   2 triad      17.254900 0.000000  9.803920 72.941200
   2 matmul_f64 81.960800 5.490200 11.764700  1.176470
   3 triad      17.647100 0.000000 10.196100 72.156900
   3 matmul_f64 81.960800 5.490200 11.764700  1.176470
   4 triad      17.647100 0.000000 11.372500 70.980400
   4 matmul_f64 81.568600 5.490200 12.156900  1.176470
   5 triad      17.647100 0.000000  9.411760 72.941200
   5 matmul_f64 81.568600 5.098040 12.156900  1.176470
   6 triad      17.647100 0.000000 10.196100 72.156900
   6 matmul_f64 81.960800 5.490200 11.764700  1.176470
   7 triad      17.647100 0.000000  9.803920 72.941200
   7 matmul_f64 81.960800 5.490200 11.764700  1.176470

We really want to have individual topdown values for each instrumented code region, so it would be great to get the reset fixed. We're aware that the topdown values can be unreliable for very short regions but that's something we can live with.

I'd appreciate it if we can get this in.

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@dbarry9 dbarry9 assigned dbarry9 and unassigned dbarry9 Nov 4, 2025
@Treece-Burgess
Copy link
Contributor

Hello @daboehme, thank you for submitting this PR. I will take a look through it.

@Treece-Burgess
Copy link
Contributor

Hello @daboehme, as an update I have looked through your PR. One issue you mentioned is that _topdown_read calls _topdown_stop which closes the file descriptors. This is true and your PR resolved this behavior.

However, in the case of the following workflow:

// First start - stop workflow
PAPI_start()
//some work done
PAPI_stop()

// Second start - stop workflow
PAPI_start()
//some work done
PAPI_stop()

You would still run into the file descriptors being closed in the second start - stop workflow and the call to ioctl in _topdown_reset would fail if the error code was checked.

I am currently working on a fix for this along with a few other changes for the component. I am going to leave this PR up until I have completed the changes necessary and submit a new PR. I will also give you credit in the commit history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants