Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multicapture 3/3] Add multi-capture capabilities to capture + merge utility #825

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Arpafaucon
Copy link
Contributor

@Arpafaucon Arpafaucon commented Jul 12, 2024

suggestion towards #471

Description

While a network-level multiplex seems the most transparent way to implement multiprocess support, @cipharius and I have seen that it requires an advanced and very precise knowledge of the packets in transit. Thus there is an inherent fragility to it (cf #766 (comment)).

Here, I wanted to propose a "dumber" and more stable approach that does not have to know a lot about the internals. My goal by doing so was to provide a first step to support multi-process workflows, with a minimal amount of maintenance.
In the limited amount of work time I got, this looked like a more reasonable approach to explore first.

The general approach

  • multi-capture is a "generalized" version of tracy-capture that listens to UDP discovery packets. Its job is to run one "capture" per discovered process. In more details, this boils down to creating multiple Worker instances pointing to the right addresses, and let them do their job.
  • merge can combine multiple trace files into a single one. Its strategy is
    • for each input file
      • ask a Worker to read it (like we do when opening the profiler GUI)
      • convert its internal data to lists of ImportEventTimeline, ImportEventMessages...
    • merge all these event lists
    • spawn a Worker to import those merged lists (that's the constructor we use for import-chrome/import-fuchsia
    • ask this worker to save its data into an output file

Pros

  • Both tools are very simple: they don't use Worker internals, nor do assumptions about the way things are stored / exchanged on the network
  • ...and as a consequence I expect them to be easy to maintain in the long term
  • there isn't any performance impact for people working in single-process mode
  • on its own, multicapture is useful too: it helps dealing with the fact that multiple tracy-enabled processes might be living at the same time, and the user might not know which one the standard capture will talk too.

Cons:

  1. we don't support live visualization, only capture
  2. we lose information in the process: frame images, lock events, system stack frames...
  3. the export-merge-import workflow is very inefficient w.r.t to what it could be. With the current approach,
  4. It's a 2-step process, and thus a bit cumbersome

While (1) is definitely a consequence of the design, I think (4) is totally fixable. We can work gradually work to improve the state of (2) and (3) by iterating on the existing import interface (that could benefit the import-chrome/fuchsia utilities too) and on the export process. It might be even possible to directly merge multiple Workers, but I haven't looked into that.

I am very aware that this is NOT the design path @wolfpld suggested when discussing multi-process support; so I wanted to show you the code as soon as I felt it worthy of your time.

As mentioned in the pros, it might well also be that multicapture is a sufficient solution on its own for some use cases. For my individual experience, this would already solve most of the pain when profiling a library used by several programs running together - but I can't presume whether this is a frequent pain point.

How to reproduce / test

Capture time

  • simply run tracy-multicapture with a prefix name
./tracy-multicapture -o test
tracy-multi-capture-reenc.webm

Merge

  • give the list of input traces to tracy-merge
# or you can use a glob pattern `test.*.tracy` to select them all
./tracy-merge -o merged-output.tracy test.449505.tracy test.449456.tracy

reading test.449505.tracy
Waiting for source locations
tracy-test @ 2024-07-12 17:53:23 (tracy-test)
- 22 events
- 2776 messages
- 4 plots
reading test.449456.tracy
Waiting for source locations
tracy-test @ 2024-07-12 17:53:22 (tracy-test)
- 22 events
- 1038 messages
- 4 plots
Writing merged-output.tracy
done

Visualize

./tracy-profiler merged-output.tracy

maim_2024-07-12T18-35-14+02-00

@wolfpld
Copy link
Owner

wolfpld commented Jul 13, 2024

multi-capture is a "generalized" version of tracy-capture that listens to UDP discovery packets. Its job is to run one "capture" per discovered process. In more details, this boils down to creating multiple Worker instances pointing to the right addresses, and let them do their job.

we don't support live visualization, only capture

It may be worth considering if this can be done within the capture utility, as a step towards multiple simultaneous traces, like requested in #203.

I am very aware that this is NOT the design path @wolfpld suggested when discussing multi-process support; so I wanted to show you the code as soon as I felt it worthy of your time.

It's ok.

@Arpafaucon
Copy link
Contributor Author

It may be worth considering if this can be done within the capture utility

I can have a look into that

@Arpafaucon
Copy link
Contributor Author

Arpafaucon commented Jul 14, 2024

It may be worth considering if this can be done within the capture utility

Some personal notes after a short glance:

➕ we can definitely do better in multicapture to emulate the capture experience

  • reuse the same progress line (with Tx/Rx, elapsed time infos)
  • print failures when they happen

➖ but I fear we won't be able to seamlessly integrate multi-capture

  • minor, but I don't think we can keep the live-update print with multiple workers (we would need to go back to previous lines, that's not portable accross the supported platforms, is it?)
  • there'll be some work in the CLI interface to properly convey the options available in multi-mode
    • address and port don't really make sense in multi-capture (though we could maybe later filter discovered clients by their IP address / ports)
    • memlimit is easy to implement by worker, quite harder if the user expects to control the combined memory footprint of all of them
    • for the output file(s)
      • in multi mode, maybe we can consider -o as the common filename prefix, with the special case that -o name.tracy means the set of files name.<pid>.tracy
      • what do you recommend to do if we see that some prefix.<number>.tracy exist and the user has not given -f? I'd recommend raising an error and exiting, but that might be overly restrictive (we'll only have a collision if we profile a process that has exactly this number, this is probably rare ... but we won't know until runtime).

Next steps;

  • I'll move functions that can be used in both utilities in a shared header, to at the very least minimize duplication
  • I'll have a go implementing the missing features in multi-capture so it can act as a drop-in replacement

I'll push that further to the code and see how it goes

@wolfpld
Copy link
Owner

wolfpld commented Jul 16, 2024

minor, but I don't think we can keep the live-update print with multiple workers (we would need to go back to previous lines, that's not portable accross the supported platforms, is it?)

I have no idea, but there are programs that are capable of doing that. Maybe the readline library would be of use here?

there'll be some work in the CLI interface to properly convey the options available in multi-mode

Valid points, but not show-stoppers. It may make sense to split the available options into three categories:

  1. Common options.
  2. Single capture options.
  3. Multi capture options.

@Arpafaucon
Copy link
Contributor Author

I have no idea, but there are programs that are capable of doing that. Maybe the readline library would be of use here?

For me that was just about editing a single line (if you indeed mean https://tiswww.case.edu/php/chet/readline/rltop.html)

Since then I've found this SO thread - I'll have to test it but that might do the trick for linux at least : https://stackoverflow.com/a/11474509/7002298

@Arpafaucon
Copy link
Contributor Author

Arpafaucon commented Jul 17, 2024

I updated the branch with an updated multicapture utility that embeds the single capture mode (the end goal being that we merge this new file as capture.cpp, but for the time being it helps to have the reference source untouched)

General refactoring

  • I have extracted some functions/constants from the code here and there to make them available without duplication. You can see the diff here: 3d235b8. If you prefer, I can spin up a dedicated PR for this diff, there should not be any functional impact.
  • Within the capture/ folder. I moved some common utilities to stand-alone functions in lib.cpp to reduce duplication: handshake message check, instrumentation failure print, and (not yet integrated into multicapture mode) live update print. Again, the goal being there to avoid any functional impacts.

Integration of multi-capture in tracy-capture

Valid points, but not show-stoppers. It may make sense to split the available options into three categories:

I have a working argument parser that will dispatch then to runCaptureSingle or runCaptureMulti.

  • I added support for long options on the way
  • Here is the help message, do you think it clear enough ?
❯ ./tracy-multicapture
Usage: multicapture -o <output/prefix> [--multi] [-a address] [-p port] [-s seconds] [-m percent]
Detect and capture multiple tracy clients from their UDP broadcast,
 and store the output in files named '<PREFIX>.<client_PID>.tracy'.

Options (a SINGLE tag indicates the option is only accessible in single capture mode):
  -o/--output <str>        Output file path (in MULTI mode, it is interpreted as a prefix)
  -f/--force               Overwrite existing files
  -M/--multi               Enable multi-capture mode
  -v/--verbose             Verbose output
  -a/--address <ipv4 str>  [SINGLE] Target IP address
  -p/--port <int>          [SINGLE] Target port
  -s/--stop-after <float>  [SINGLE] Stop profiling after this duration (in seconds)
  -m/--memlimit <float>    [SINGLE] Set a memory limit (in % of the total RAM)

In single-capture mode, 'capture' directly connects to the TCP data stream at address:port .
In multi-capture mode, profiled targets are detected from UDP broadcast packets;
capture stops once all detected targets disconnect.
  • there are some options we could make use of in multicapture, but I suggest to defer that to a later MR

Next steps

Still to do:

  • live print in multi-capture
  • spin up a PR for a minor addition in tracy-test: the ability to trigger some failures to check the profiler's response to them (commit: 59b2f2e) -> [test] Trigger failures via USR1/USR2 #833
  • there are build failures in the CI for MacOS, to investigate

And to recap the things I'd welcome your input on:

  • I'd feel more confident if we adopt this "embedding" strategy (=limit as much as possible the changes in the single-capture mode) as a first step to add support for multicapture. This way, we're sure to avoid breaking anything on the single-capture use-case, while being able to gradually refactor and merge the logic. WDYT ?
  • OK with the CLI interface + help message ?
  • OK with the refactoring commit (notably in profiler/main.cpp). Do you prefer a separate PR ?
  • maybe the merger part can go to a separate PR ?

EDIT: updated the link to commit after rebasing to master

@Arpafaucon
Copy link
Contributor Author

Quite happy about this live update, what do you think ?

tracy-multi-capture-liveupdate.webm

@wolfpld
Copy link
Owner

wolfpld commented Jul 17, 2024

Quite happy about this live update, what do you think ?

It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?

@Arpafaucon
Copy link
Contributor Author

Quite happy about this live update, what do you think ?

It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?

tracy-multi-capture-liveupdate-clean.webm

@wolfpld
Copy link
Owner

wolfpld commented Jul 17, 2024

Looks great!

@Arpafaucon
Copy link
Contributor Author

Arpafaucon commented Jul 17, 2024

If an error occurs (cf other MR ^^), we get this

maim_2024-07-17T22-00-11+02-00

It seems to be a bit busy due to what I assume is debug info. Can you show a version that just adds a new line when another connection is established?

I've now hidden most of the relevant (at least I think) additional infos behind --verbose

@Arpafaucon
Copy link
Contributor Author

Marking the MR as ready for review

Regarding merge strategy, I can spin up smaller MRs with more atomic changes if you want to review things in multiple steps. It would look like

  • (A) refactor existing code to extract common functions / constants
  • (B) support for multi-process capture in capture
  • (C) tracy-merge utility

@wolfpld
Copy link
Owner

wolfpld commented Jul 18, 2024

It seems there are some commits here which are intended as fixups in a rebase.

@Arpafaucon
Copy link
Contributor Author

It seems there are some commits here which are intended as fixups in a rebase.

not sure what you mean by that; in any case I rebased and cleaned up the branch history to clarify the separation A/B/C as mentioned in #825 (comment)

* 26df5524 (HEAD -> multicapture-merge) [multicapture] New binary, move common functions to lib.cpp
* 9111933e [capture] Refactor: extract common functions
* 11bea71b [merger] multiprocess/ folder, with tracy-merge binary
* 59a92210 [refactor] Extract useful functions/constants for capture

@Arpafaucon
Copy link
Contributor Author

I extracted the non-functional refactors into #837 to simplify the review, and make sure those changes are not impacting anything. That may be a good starting point to merge the changes, and will reduce the diff here.

Same for #838 for the merge utility that has no interaction with the changes in capture

@Arpafaucon Arpafaucon changed the title Alternative way to profile multiple processes: multicapture + merge [multicapture 3/3] Add multi-capture capabilities to capture + merge utility Jul 19, 2024
@Arpafaucon
Copy link
Contributor Author

Note: if you're interested, I can also spin up a dedicated PR for the internal refactor in capture.cpp (commit 9111933). This way, the diff for the actual functional changes of multi-capture will be much clearer

@wolfpld
Copy link
Owner

wolfpld commented Aug 8, 2024

Please run the changes you are proposing through clang-format. At this point it's hard to review.

@Arpafaucon
Copy link
Contributor Author

Please run the changes you are proposing through clang-format. At this point it's hard to review.

Sure ! Just did it on this MR.

To clarify, do you intend to review separately #837 and #838, in which case I'll continue to keep them up-to-date, or will you tackle the whole change as a whole ? I'm was trying to save you time by splitting the work this way, but I'm not sure it's actually helping you - please clarify what you prefer :)

@Arpafaucon
Copy link
Contributor Author

My suggestion would be to address #837 and #838, and then I can spin up a finishing MR that will tidy things up (without the intermediary lib.cpp, and actually replacing capture.cpp with the content of multicapturec.pp). But you prefer another way, please tell me so I can adapt

@wolfpld
Copy link
Owner

wolfpld commented Aug 8, 2024

Yes, please do the other PRs too.

@wolfpld wolfpld force-pushed the master branch 4 times, most recently from 2511616 to aa451b4 Compare September 20, 2024 19:28
@afaure42
Copy link

afaure42 commented Nov 8, 2024

Hi ( somewhat new to github dont hesitate to direct me to the correct place for this if it's not here )
The tracy-merge util seems to delete some data on the traces.

  • Lock
  • ZoneText

are not visible after a merge

This code when captured and launched on the profiler gives this trace

#include <mutex>
#include <iostream>
#include "Tracy.hpp"

constexpr const char * const hello = "Hello, Tracy!";
constexpr const char * const mutex_name = "myMutex";
int main()
{
	ZoneScoped;
	ZoneText(hello, strlen(hello));
	TracyLockableN (std::mutex, myMutex, mutex_name);

	myMutex.lock();
	sleep(1);
	myMutex.unlock();
}

Screenshot from 2024-11-08 15-02-45

however after running it through the tracy-merge util the lock as well as CPU usage and ZoneText all disappear

Screenshot from 2024-11-08 15-04-00

@Arpafaucon
Copy link
Contributor Author

Hi @afaure42 ! first of all, thanks for taking the time to test this branch :)

Sadly, the limitations you mentioned are expected - cf the first message of the MR -> 2nd limitation

Cons:

  1. we don't support live visualization, only capture
  2. we lose information in the process: frame images, lock events, system stack frames...
  3. the export-merge-import workflow is very inefficient w.r.t to what it could be. With the current approach,
  4. It's a 2-step process, and thus a bit cumbersome

I'm not saying it's impossible to do, but this would require more work. IIRC (I did this MR in summer), including those infos would require some reworks in the export/import process I'm using for this feature (we cannot yet import those infos via the existing TracyWorker import constructor).

@afaure42
Copy link

Thanks for your answer @Arpafaucon !

I understand the limitations, i'll look into it maybe i can give it a try !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants