Commit ba6c981
committed
trigger profiling on abort
Summary:
record the profile trace if the training process receives SIGABRT e.g. when Process Group watchdog aborts the process1 parent 9597b87 commit ba6c981
File tree
3 files changed
+39
-20
lines changed- torchtitan
- experiments/forge
- tools
3 files changed
+39
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
280 | 286 | | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | 287 | | |
287 | 288 | | |
288 | 289 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
68 | 67 | | |
69 | 68 | | |
70 | 69 | | |
71 | | - | |
| 70 | + | |
72 | 71 | | |
73 | 72 | | |
74 | 73 | | |
75 | 74 | | |
76 | 75 | | |
77 | 76 | | |
78 | 77 | | |
79 | | - | |
80 | | - | |
81 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
82 | 82 | | |
83 | | - | |
84 | | - | |
| 83 | + | |
85 | 84 | | |
86 | 85 | | |
87 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| 38 | + | |
| 39 | + | |
36 | 40 | | |
37 | 41 | | |
| 42 | + | |
| 43 | + | |
38 | 44 | | |
39 | 45 | | |
40 | 46 | | |
| |||
555 | 561 | | |
556 | 562 | | |
557 | 563 | | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
558 | 571 | | |
559 | | - | |
560 | | - | |
561 | | - | |
562 | | - | |
563 | | - | |
564 | | - | |
565 | 572 | | |
566 | 573 | | |
567 | 574 | | |
| |||
585 | 592 | | |
586 | 593 | | |
587 | 594 | | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
588 | 605 | | |
589 | 606 | | |
590 | 607 | | |
| |||
608 | 625 | | |
609 | 626 | | |
610 | 627 | | |
611 | | - | |
612 | | - | |
| 628 | + | |
| 629 | + | |
613 | 630 | | |
614 | 631 | | |
615 | 632 | | |
| |||
673 | 690 | | |
674 | 691 | | |
675 | 692 | | |
| 693 | + | |
676 | 694 | | |
677 | 695 | | |
678 | 696 | | |
679 | 697 | | |
| 698 | + | |
680 | 699 | | |
681 | 700 | | |
682 | 701 | | |
0 commit comments