-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformer lm #54
base: master
Are you sure you want to change the base?
Conformer lm #54
Conversation
…l/prepare_lm_training_data.py
We will add decoding script to it. |
See also https://tensorboard.dev/experiment/unF4gSyjRjua2DSKgb3BMg/ |
bos or eos symbols). | ||
""" | ||
# in future will just do: | ||
#return self.words[self.sentences[i]].tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supported by k2-fsa/k2#833
Cool!
…On Sat, Sep 25, 2021 at 7:05 PM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/librispeech/ASR/conformer_lm/dataset.py
<#54 (comment)>:
> + super(LmDataset, self).__init__()
+ self.sentences = sentences
+ self.words = words
+
+
+ def __len__(self):
+ # Total size on axis 0, == num sentences
+ return self.sentences.tot_size(0)
+
+ def __getitem__(self, i: int):
+ """
+ Return the i'th sentence, as a list of ints (representing BPE pieces, without
+ bos or eos symbols).
+ """
+ # in future will just do:
+ #return self.words[self.sentences[i]].tolist()
Supported by k2-fsa/k2#833 <k2-fsa/k2#833>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#54 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO6OB42G2EE63756OJ3UDWUHRANCNFSM5EVAYUMQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
cnn_module_kernel, | ||
) | ||
self.encoder = MaskedLmConformerEncoder(encoder_layer, num_encoder_layers, | ||
norm=nn.LayerNorm(d_model)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using pre-normalization here.
You have placed a layer norm at the end of the encoder layer, see
x = self.norm_final(x) |
You are using an extra layer norm here, which means you are doing
x = layernorm(layernorm(x))
See
icefall/egs/librispeech/ASR/conformer_lm/conformer.py
Lines 967 to 968 in 0cfa8c8
if self.norm is not None: | |
x = self.norm(x) |
I just realized that it is even worse.
You are using a layer norm at both ends of an encoder layer but
encoder layers are connected end-by-end, which means the output of the layer norm
from the previous encoder layer is used as the input of the layer norm of the next encoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "even worse" part is not quite right, because there are bypass connections, so there are paths involving residuals where the input is used without layer norm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. but yes, I agree that the LayerNorm at the end of the conformer encoder is redundant. I have since stopped using that. But in this particular case, it would take a long time to retrain the model if we were to fix it, so I'd say leave it as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I'd say leave it as-is for now.
Yes, I agree. After finishing the decoding script, I would recommend removing it and
and re-run the whole pipeline.
Co-authored-by: Fangjun Kuang <[email protected]>
lm_dir=data/lm_training_${vocab_size} | ||
mkdir -p $lm_dir | ||
log "Stage 9: creating $lm_dir/lm_data.pt" | ||
./local/prepare_lm_training_data.py data/lang_bpe_${vocab_size}/bpe.model download/lm/librispeech-lm-norm.txt $lm_dir/lm_data.pt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./local/prepare_lm_training_data.py data/lang_bpe_${vocab_size}/bpe.model download/lm/librispeech-lm-norm.txt $lm_dir/lm_data.pt | |
./local/prepare_lm_training_data.py $lang_dir/bpe.model $dl_dir/lm/librispeech-lm-norm.txt $lm_dir/lm_data.pt |
cnn_module_kernel, | ||
) | ||
self.encoder = MaskedLmConformerEncoder(encoder_layer, num_encoder_layers, | ||
norm=nn.LayerNorm(d_model)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I'd say leave it as-is for now.
Yes, I agree. After finishing the decoding script, I would recommend removing it and
and re-run the whole pipeline.
# Calling this on all copies of a DDP setup will sync the sizes so that | ||
# all copies have the exact same number of batches. I think | ||
# this needs to be called with the GPU device, not sure if it would | ||
# work otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not work for CPU devices as DDP requires GPU devices.
|
||
def _sync_sizes(self, device: torch.device = torch.device('cuda')): | ||
# Calling this on all copies of a DDP setup will sync the sizes so that | ||
# all copies have the exact same number of batches. I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we mention that without doing this, the training process
will hang indefinitely at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm, sure...
No description provided.