-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autotp training #6922
Open
inkcherry
wants to merge
84
commits into
microsoft:master
Choose a base branch
from
inkcherry:autotp_training
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Autotp training #6922
Changes from 82 commits
Commits
Show all changes
84 commits
Select commit
Hold shift + click to select a range
674a873
auto tp training
inkcherry a2e4c47
update parallel_states
inkcherry f4eb142
Merge branch 'master' into HEAD
inkcherry dd081ed
WA skips assertions, the loss remains exactly consistent with the low…
inkcherry cdaed2f
save/load ckpt & save/load hf model basic POC
inkcherry 9aad0e7
finish all the basic functionalities
inkcherry 2bb11fd
update
inkcherry e75c1c2
use groups for parallel_states
inkcherry 840a5f2
enable bwd allreduce, enable scale loss by gas
inkcherry 60bd6ab
add dataloader check
inkcherry 9266383
refactor autoTP step1
inkcherry 07174a9
rm parallel_states
inkcherry ee6323e
refactor autoTP step2
inkcherry 6461b84
update ut step1
inkcherry 4d73011
update
inkcherry c79c3bb
add uts
inkcherry 97e659c
finished all ut code base
inkcherry a15905b
addllr scheduler test
inkcherry e9802b0
refine ut
inkcherry 88b8acf
fix bcast_objlist
inkcherry 868be0b
refine layers.py
inkcherry 3788e07
refine gather
inkcherry 27b24f6
pass codegen350M +TP2 ut
inkcherry 3d7b89f
add mode choice
inkcherry 47a6b0b
fix chatglm
inkcherry 3a23997
fix chatglm2 with transformers=4.40 version
inkcherry e3ec46e
uneven
inkcherry 9685879
fix uneven
inkcherry 7b99b03
fix training
inkcherry 570645f
refine code
inkcherry 3729b64
remove skip bcase&reduce
inkcherry 62d8858
fix typo
inkcherry dd17313
format
inkcherry 93cf6f5
refine code
inkcherry 87c4bc2
refine code
inkcherry 1714bb5
refine
inkcherry dadf915
update yuan
inkcherry 86c9399
optimize usage of move function
inkcherry 2526dc6
refine args usage
inkcherry c9fd699
format
inkcherry 797e71f
zero1 compatible
inkcherry 86ae65e
remove wa
inkcherry 3e40024
fix cpu device name
inkcherry 7d94b77
fix lm-head
inkcherry b297950
add detach
inkcherry 67ce220
fix ipex intergration
inkcherry f818be9
fix tied_embedding
inkcherry 11c98f6
Merge remote-tracking branch 'origin/master' into autotp_training
inkcherry e22b625
format
inkcherry 8531b64
Merge branch 'master' into autotp_training
tjruwase 8d19e01
Merge branch 'master' into autotp_training
loadams 060d48b
remove outdated comments
inkcherry 6667ba1
Enhance unit test coverage
inkcherry 84c9335
update ut
inkcherry cb29d7c
sequential some tests
inkcherry a49e77e
format
inkcherry 0ef5274
use parameterized save path
inkcherry 481088d
Merge remote-tracking branch 'my/autotp_training' into autotp_training
inkcherry f740de0
refactor infer/training path
inkcherry 726004d
format
inkcherry bd8de77
remove empty line
inkcherry c334da0
remove autotp_size config from zero scope
inkcherry 29eef07
update
inkcherry ba47ed1
format
inkcherry bbde63f
fix layer typo and rename
inkcherry bdca62c
fix python3.9
inkcherry 5d89422
refine code
inkcherry 0a9caff
refine
inkcherry c923a3b
refine config
inkcherry 92be193
improve ut coverage for save
inkcherry 23bd0fc
fix process exit early
inkcherry 358f395
improve ut coverage
inkcherry cdfb54c
Merge remote-tracking branch 'origin/master' into autotp_training
inkcherry 6d030c4
fix zero1 regression
inkcherry f9e7756
Merge branch 'master' into autotp_training
inkcherry 6e7f846
fix ci
inkcherry c4fde7e
Merge branch 'autotp_training' of https://github.com/inkcherry/DeepSp…
inkcherry 05bcecd
skip overflow test
inkcherry 86f1c77
Merge branch 'master' into autotp_training
inkcherry 668cb1a
Skip xpu tests until the ci is updated
inkcherry 2e042a4
Merge branch 'autotp_training' of https://github.com/inkcherry/DeepSp…
inkcherry e08a234
Merge branch 'master' into autotp_training
delock 20588f2
Merge branch 'master' into autotp_training
tjruwase 1e05996
Merge branch 'master' into autotp_training
hwchen2017 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This additional code block is trying to deal with "MLP including chunk layer" (general case), but the returned module/object is in the name of GLM prefix.
It could be better to rename the
GLM_LinearLayer
to sth likeGateUpPack_LinearLayer
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the comments, modified:)