Commit 88bb526
committed
trigger profiling on abort
Summary:
record the profile trace if the training process receives SIGABRT e.g. when Process Group watchdog aborts the process1 parent 7e6afe5 commit 88bb526
File tree
3 files changed
+38
-20
lines changed- torchtitan
- experiments/forge
- tools
3 files changed
+38
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
281 | 281 | | |
282 | 282 | | |
283 | 283 | | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
284 | 290 | | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | 291 | | |
291 | 292 | | |
292 | 293 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
68 | 67 | | |
69 | 68 | | |
70 | 69 | | |
71 | | - | |
| 70 | + | |
72 | 71 | | |
73 | 72 | | |
74 | 73 | | |
75 | 74 | | |
76 | 75 | | |
77 | 76 | | |
78 | 77 | | |
79 | | - | |
80 | | - | |
81 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
82 | 82 | | |
83 | | - | |
84 | | - | |
| 83 | + | |
85 | 84 | | |
86 | 85 | | |
87 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| 37 | + | |
| 38 | + | |
35 | 39 | | |
36 | 40 | | |
| 41 | + | |
| 42 | + | |
37 | 43 | | |
38 | 44 | | |
39 | 45 | | |
| |||
580 | 586 | | |
581 | 587 | | |
582 | 588 | | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
583 | 596 | | |
584 | | - | |
585 | | - | |
586 | | - | |
587 | | - | |
588 | | - | |
589 | | - | |
590 | 597 | | |
591 | 598 | | |
592 | 599 | | |
| |||
610 | 617 | | |
611 | 618 | | |
612 | 619 | | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
613 | 629 | | |
614 | 630 | | |
615 | 631 | | |
| |||
633 | 649 | | |
634 | 650 | | |
635 | 651 | | |
636 | | - | |
637 | | - | |
| 652 | + | |
| 653 | + | |
638 | 654 | | |
639 | 655 | | |
640 | 656 | | |
| |||
692 | 708 | | |
693 | 709 | | |
694 | 710 | | |
| 711 | + | |
695 | 712 | | |
696 | 713 | | |
697 | 714 | | |
698 | 715 | | |
| 716 | + | |
699 | 717 | | |
700 | 718 | | |
701 | 719 | | |
0 commit comments