Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf event collection overhaul #55

Open
brayniac opened this issue Oct 10, 2019 · 2 comments
Open

perf event collection overhaul #55

brayniac opened this issue Oct 10, 2019 · 2 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@brayniac
Copy link
Contributor

Perf events sampler needs more advanced scheduling. We may want to limit the number of event counters to avoid multiplexing on the PMUs. Extending support to per-cgroup as requested in #19 requires that we do something clever to avoid performance penalties. At the same time, we would want to be able to enforce that things like Cycles and Instructions are sampled across the same intervals so CPI calculations remain valid.

@brayniac brayniac added the enhancement New feature or request label Oct 10, 2019
@brayniac brayniac added this to the rezolus 2.0 milestone Oct 10, 2019
@brayniac brayniac self-assigned this Oct 10, 2019
@sargun
Copy link

sargun commented Apr 25, 2020

In regards to sampling cycles / instructions across the same interval, wouldn't putting them in the same perf event group solve that problem so the kernel schedules them together?

@brayniac
Copy link
Contributor Author

@sargun - there's more to this than that. We might want to change the config to allow for explicit collection of CPI instead of tracking cycles and instructions separately. This would allow us to provide histograms of CPI from sub-minutely intervals across each minute.

The other aspect of this is how we can collect perf events with low impact to production workloads. We've seen high-impact collecting per-cgroup for even a single event and impact from collecting many host-level counters with perf subsystem handling multiplexing. I haven't dug into the perf multiplexing code yet to see if there's anything we could do better in terms of mapping events to PMUs. Assuming that's already optimized, we probably need to limit the duty cycle of collection for each event and essentially do sampling instead of counting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants