Skip to content

Fix typos #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/finetune_modernbert_on_glue.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@
" \"rte\": {\n",
" \"abbr\": \"RTE\",\n",
" \"name\": \"Recognize Textual Entailment\",\n",
" \"description\": \"Predict whether one sentece entails another\",\n",
" \"description\": \"Predict whether one sentence entails another\",\n",
" \"task_type\": \"Inference Tasks\",\n",
" \"domain\": \"News, Wikipedia\",\n",
" \"size\": \"2.5k\",\n",
Expand Down Expand Up @@ -528,7 +528,7 @@
"source": [
"### Tokenizer\n",
"\n",
"Next we define our Tokenizer and a preprocess function to create the input_ids, attention_mask, and token_type_ids the model nees to train. For this example, including `truncation=True` is enough as we'll rely on our data collation function below to put our batches into the correct shape."
"Next we define our Tokenizer and a preprocess function to create the input_ids, attention_mask, and token_type_ids the model needs to train. For this example, including `truncation=True` is enough as we'll rely on our data collation function below to put our batches into the correct shape."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,7 @@ def init_from_checkpoint(cfg: DictConfig, new_model: nn.Module):
new_model=new_model.model,
mode=cfg.get("mode", "tile_weights_from_middle"),
)
print(f"Initalized model from checkpoint {cfg.checkpoint_run_name} with {n_params=:.4e} parameters")
print(f"Initialized model from checkpoint {cfg.checkpoint_run_name} with {n_params=:.4e} parameters")


def main(cfg: DictConfig, return_trainer: bool = False, do_train: bool = True) -> Optional[Trainer]:
Expand Down
4 changes: 2 additions & 2 deletions src/bert_layers/configuration_bert.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def __init__(
create when initializing the model. You should be able to ignore this parameter in most cases.
Defaults to 512.
attention_probs_dropout_prob (float): By default, turn off attention dropout in MosaicBERT
Note that the custom Triton Flash Attention with ALiBi implementation does not support droput.
Note that the custom Triton Flash Attention with ALiBi implementation does not support dropout.
However, Flash Attention 2 supports ALiBi and dropout https://github.com/Dao-AILab/flash-attention
embed_dropout_prob (float): Dropout probability for the embedding layer.
attn_out_dropout_prob (float): Dropout probability for the attention output layer.
Expand Down Expand Up @@ -155,7 +155,7 @@ def __init__(
unpad_embeddings (bool): Unpad inputs before the embedding layer.
pad_logits (bool): Pad logits after the calculating the loss.
compile_model (bool): Compile the subset of the model which can be compiled.
masked_prediction (bool): Use only pass the masked tokens throught the final MLM layers
masked_prediction (bool): Use only pass the masked tokens through the final MLM layers
**kwargs: Additional keyword arguments.
"""
super().__init__(attention_probs_dropout_prob=attention_probs_dropout_prob, **kwargs)
Expand Down
2 changes: 1 addition & 1 deletion src/bert_layers/rotary.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ def __init__(
we add this option.
max_seqlen: if max_seqlen, device, and dtype are provided, we precompute the cos_sin_cache
up to max_seqlen. If the max_seqlen, device, or dtype during training/inference differ,
the cos_sin_cache wll be recomputed during the forward pass.
the cos_sin_cache will be recomputed during the forward pass.
"""
super().__init__()
self.dim = dim
Expand Down
2 changes: 1 addition & 1 deletion src/evals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ python eval.py yamls/ablations/checkpoint_name.yaml

## Automatically generate eval configs for multiple checkpoints and run evals on multiple GPUs

`run_evals_from_checkpoints.py` can be used to automatically generate configs from the latest checkpoints in a given directory, and run all evals on all avalible GPUs.
`run_evals_from_checkpoints.py` can be used to automatically generate configs from the latest checkpoints in a given directory, and run all evals on all available GPUs.

Run `python run_evals_from_checkpoints.py --help` for all options. All options from `generate_eval_config_from_checkpoint.py` are also available.

Expand Down
4 changes: 2 additions & 2 deletions src/flex_bert.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
all = ["create_flex_bert_mlm", "create_flex_bert_classification"]


# we want the efficent versions to have the same name as the TorchMetrics' name
# we want the efficient versions to have the same name as the TorchMetrics' name
def rename_class(new_name):
def class_renamer(cls):
cls.__name__ = new_name
Expand Down Expand Up @@ -398,7 +398,7 @@ def create_flex_bert_classification(
First, it will switch the training loss to :class:`~torch.nn.MSELoss`.
Second, the returned :class:`.ComposerModel`'s train/validation metrics
will be :class:`~torchmetrics.MeanSquaredError` and
:class:`~torchmetrics.SpearmanCorrCoef`. For the classifcation case
:class:`~torchmetrics.SpearmanCorrCoef`. For the classification case
(when ``num_labels > 1``), the training loss is
:class:`~torch.nn.CrossEntropyLoss`, and the train/validation
metrics are :class:`~torchmetrics.MulticlassAccuracy` and
Expand Down
2 changes: 1 addition & 1 deletion src/hf_bert.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ def create_hf_bert_classification(
This will have two noteworthy effects. First, it will switch the training loss to :class:`~torch.nn.MSELoss`.
Second, the returned :class:`.ComposerModel`'s train/validation metrics will be :class:`~torchmetrics.MeanSquaredError` and :class:`~torchmetrics.SpearmanCorrCoef`.

For the classifcation case (when ``num_labels > 1``), the training loss is :class:`~torch.nn.CrossEntropyLoss`, and the train/validation
For the classification case (when ``num_labels > 1``), the training loss is :class:`~torch.nn.CrossEntropyLoss`, and the train/validation
metrics are :class:`~torchmetrics.MulticlassAccuracy` and :class:`~torchmetrics.MatthewsCorrCoef`, as well as :class:`.BinaryF1Score` if ``num_labels == 2``.
"""
try:
Expand Down
2 changes: 1 addition & 1 deletion src/mosaic_bert.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ def create_mosaic_bert_classification(
First, it will switch the training loss to :class:`~torch.nn.MSELoss`.
Second, the returned :class:`.ComposerModel`'s train/validation metrics
will be :class:`~torchmetrics.MeanSquaredError` and
:class:`~torchmetrics.SpearmanCorrCoef`. For the classifcation case
:class:`~torchmetrics.SpearmanCorrCoef`. For the classification case
(when ``num_labels > 1``), the training loss is
:class:`~torch.nn.CrossEntropyLoss`, and the train/validation
metrics are :class:`~torchmetrics.MulticlassAccuracy` and
Expand Down
2 changes: 1 addition & 1 deletion wandb_log_live_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def process_data(args):
try:
meta = parse_model_string(run.name)
except ValueError:
print(f"Skipping run with unparseable name: {run.name}")
print(f"Skipping run with unparsable name: {run.name}")
continue
task = meta["task"]
summary = run.summary
Expand Down