Commit a8e24ed
committed
trigger profiling on abort
Summary:
record the profile trace if the training process receives SIGABRT e.g. when Process Group watchdog aborts the process1 parent e3bb189 commit a8e24ed
File tree
3 files changed
+38
-20
lines changed- torchtitan
- experiments/forge
- tools
3 files changed
+38
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
281 | 281 | | |
282 | 282 | | |
283 | 283 | | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
284 | 290 | | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | 291 | | |
291 | 292 | | |
292 | 293 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
68 | 67 | | |
69 | 68 | | |
70 | 69 | | |
71 | | - | |
| 70 | + | |
72 | 71 | | |
73 | 72 | | |
74 | 73 | | |
75 | 74 | | |
76 | 75 | | |
77 | 76 | | |
78 | 77 | | |
79 | | - | |
80 | | - | |
81 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
82 | 82 | | |
83 | | - | |
84 | | - | |
| 83 | + | |
85 | 84 | | |
86 | 85 | | |
87 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| 37 | + | |
| 38 | + | |
35 | 39 | | |
36 | 40 | | |
| 41 | + | |
| 42 | + | |
37 | 43 | | |
38 | 44 | | |
39 | 45 | | |
| |||
572 | 578 | | |
573 | 579 | | |
574 | 580 | | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
575 | 588 | | |
576 | | - | |
577 | | - | |
578 | | - | |
579 | | - | |
580 | | - | |
581 | | - | |
582 | 589 | | |
583 | 590 | | |
584 | 591 | | |
| |||
602 | 609 | | |
603 | 610 | | |
604 | 611 | | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
605 | 621 | | |
606 | 622 | | |
607 | 623 | | |
| |||
625 | 641 | | |
626 | 642 | | |
627 | 643 | | |
628 | | - | |
629 | | - | |
| 644 | + | |
| 645 | + | |
630 | 646 | | |
631 | 647 | | |
632 | 648 | | |
| |||
684 | 700 | | |
685 | 701 | | |
686 | 702 | | |
| 703 | + | |
687 | 704 | | |
688 | 705 | | |
689 | 706 | | |
690 | 707 | | |
| 708 | + | |
691 | 709 | | |
692 | 710 | | |
693 | 711 | | |
0 commit comments