Skip to content

Commit 2a71fda

Browse files
authoredJan 7, 2023
Add Codespell to CI, fix typos (#543)
* Add Codespell to CI, fix typos
1 parent e18868f commit 2a71fda

18 files changed

+29
-19
lines changed
 

‎.github/workflows/check-style.yml

+8
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,11 @@ jobs:
2424
- uses: isort/isort-action@master
2525
with:
2626
isortVersion: "5.10.1"
27+
28+
codespell:
29+
runs-on: ubuntu-latest
30+
steps:
31+
- uses: actions/checkout@v2
32+
- uses: codespell-project/actions-codespell@v1
33+
with:
34+
only_warn: 1

‎CONTRIBUTING.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@ with the following rules:
3838
cannot be longer than 119 characters.
3939
* We use [black](https://github.com/psf/black) for code formatting and [isort](https://github.com/PyCQA/isort) for
4040
import sorting. Before submitting a PR, make sure to install and run `black .` and `isort .` in the root of the
41-
repository.
41+
repository. Also, you may want to check your code for typos by running `codespell --skip=".git"`, though there
42+
might be false positives.
4243
* We highly encourage the use of [typing](https://docs.python.org/3/library/typing.html) where applicable.
4344
* Use `get_logger` from `hivemind.utils.logging` to log any information instead of `print`ing directly to standard
4445
output/error streams.

‎README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ see the [full list](#citation) of our papers below.
2929
## Example Use Cases
3030

3131
This section lists projects that leverage hivemind for decentralized training.
32-
If you have succesfully trained a model or created a downstream repository with the help of our library,
32+
If you have successfully trained a model or created a downstream repository with the help of our library,
3333
feel free to submit a pull request that adds your project to this list.
3434

3535
* **Petals** ([webpage](https://petals.ml), [code](https://github.com/bigscience-workshop/petals)) — a decentralized platform for inference and fine-tuning of 100B+ language models.

‎benchmarks/benchmark_dht.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ async def store_and_get_task(
5151
latest: bool,
5252
node_killer: NodeKiller,
5353
) -> Tuple[list, list, list, list, int, int]:
54-
"""Iteratively choose random peers to store data onto the dht, then retreive with another random subset of peers"""
54+
"""Iteratively choose random peers to store data onto the dht, then retrieve with another random subset of peers"""
5555

5656
total_stores = total_gets = 0
5757
successful_stores = []

‎docs/modules/optim.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
This module contains decentralized optimizers that wrap your regular PyTorch Optimizer to train with peers.
77
Depending on the exact configuration, Optimizer may perform large synchronous updates equivalent,
8-
or perform asynchrnous local updates and average model parameters.
8+
or perform asynchronous local updates and average model parameters.
99

1010
<br><br>
1111

‎docs/user/dht.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ dht = hivemind.DHT(
119119
], start=True)
120120
```
121121

122-
Thats it, now the two DHT nodes are connected. If you connect additional peers to the network, you only need to specify
122+
That's it, now the two DHT nodes are connected. If you connect additional peers to the network, you only need to specify
123123
one (or a subset) of peers as `initial_peers`.
124124
In case your peer operates behind a restrictive firewall, you may find it beneficial to set `client_mode=True`. In this
125125
case, the DHT instance will access others, but it will not announce that other peers can connect to it.

‎hivemind/averaging/averager.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ class DecentralizedAverager(mp.Process, ServicerBase):
6262
:param min_matchmaking_time: when looking for group, wait for requests for at least this many seconds
6363
:param compression: optionally compress tensors with this compression algorithm before running all-reduce
6464
:param state_compression: a separate compression strategy for load_state_from_peers (default = no compression)
65-
:param tensor_infos: CompressionInfo for each respective tensor; this determines how the tensor will be comressed
65+
:param tensor_infos: CompressionInfo for each respective tensor; this determines how the tensor will be compressed
6666
:param averaging_alpha: optional "learning rate" for averaging. If specified, local parameters will be shifted
6767
towards the (estimated) average by this coefficient. By default, local parameters are set equal to average.
6868
:param request_timeout: when looking for group, wait for a response from leader for at most this many seconds.
@@ -376,7 +376,7 @@ def step(
376376
"""
377377
Set up the averager to look for a group and run one round of averaging, return True on success, False on failure
378378
379-
:param gather: optionally send this informaton to all peers in the next group and gather it from every groupmate
379+
:param gather: optionally send this information to all peers in the next group and gather it from every groupmate
380380
(this operation is known as all-gather). The gathered data will be available as the output of this function.
381381
:param scheduled_time: when matchmaking, assume that all-reduce will begin at this moment.
382382
By default, schedule all-reduce current time plus min_matchmaking_time seconds
@@ -651,7 +651,7 @@ async def rpc_download_state(
651651

652652
def get_current_state(self) -> Tuple[Any, Sequence[torch.Tensor], Sequence[CompressionInfo]]:
653653
"""
654-
Get current state and send it to a peer. executed in the host process. Meant to be overriden.
654+
Get current state and send it to a peer. executed in the host process. Meant to be overridden.
655655
:returns: a tuple of (small metadata, sequence of torch tensors)
656656
:note: metadata must be seriablizable with self.serializer (default = MSGPackSerializer)
657657
"""

‎hivemind/averaging/partition.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ class TensorPartContainer:
2626
:param peer_fractions: for each peer, a target fraction of vector elements that this peer should average
2727
:param compression: optionally compress tensors with this compression algorithm before sending them to peers
2828
:param part_size_bytes: greedily split tensors into parts of up to this many bytes (after compression)
29-
:param tensor_infos: CompressionInfo for each respective tensor; this determines how the tensor will be comressed
29+
:param tensor_infos: CompressionInfo for each respective tensor; this determines how the tensor will be compressed
3030
:param return_deltas: if True, output tensors are differences (aggregated tensor - local tensor)
3131
:param prefetch: when compressing, pre-compute this many compressed tensors in background
3232
"""

‎hivemind/compression/base.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ def compress(self, tensor: torch.Tensor, info: CompressionInfo, allow_inplace: b
5353
"""
5454
Applies compression algorithm to a tensor based on their meta-parameters
5555
56-
:param tensor: a pytorch tensor to compress; depending on the applicaiton, it is a full tensor or a part
56+
:param tensor: a pytorch tensor to compress; depending on the application, it is a full tensor or a part
5757
:param info: meta-information about the tensor; if partitioning is used, this still describes the full tensor
5858
:param allow_inplace: if True, compression can (but doesn't have to) to modify tensor in-place for efficiency
5959
:returns: a protobuf message that encodes the tensor

‎hivemind/dht/node.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -586,7 +586,7 @@ async def get_many_by_id(
586586
If min_expiration_time=float('inf'), this method will find a value with _latest_ expiration
587587
:param beam_size: maintains up to this many nearest nodes when crawling dht, default beam_size = bucket_size
588588
:param num_workers: override for default num_workers, see traverse_dht num_workers param
589-
:param return_futures: if True, immediately return asyncio.Future for every before interacting with the nework.
589+
:param return_futures: if True, immediately return asyncio.Future for every before interacting with the network.
590590
The algorithm will populate these futures with (value, expiration) when it finds the corresponding key
591591
Note: canceling a future will stop search for the corresponding key
592592
:param _is_refresh: internal flag, set to True by an internal cache refresher (if enabled)

‎hivemind/dht/routing.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
""" Utlity data structures to represent DHT nodes (peers), data keys, and routing tables. """
1+
""" Utility data structures to represent DHT nodes (peers), data keys, and routing tables. """
22
from __future__ import annotations
33

44
import hashlib

‎hivemind/moe/server/server.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ def shutdown(self):
302302
logger.debug(f"Shutting down runtime")
303303
self.runtime.shutdown()
304304

305-
logger.info("Server shutdown succesfully")
305+
logger.info("Server shutdown successfully")
306306

307307

308308
@contextmanager

‎hivemind/optim/grad_averager.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class GradientAverager(DecentralizedAverager):
2929
(3) averaged gradients - gradient buffers that are aggregated in-place with peers, always in host memory
3030
3131
:param parameters: pytorch parameters for which to aggregate gradients
32-
:param dht: a DHT isntance connected to the rest of the swarm. See hivemind.DHT docs
32+
:param dht: a DHT instance connected to the rest of the swarm. See hivemind.DHT docs
3333
:param prefix: a unique DHT key used for matchmaking. E.g. this can be your experiment name with optional suffixes
3434
:param reuse_grad_buffers: if True, use model's .grad buffers for accumulating gradients over multiple steps.
3535
This is more memory efficient, but it requires that the user does *not* call zero_grad or clip_by_whatever at all

‎hivemind/optim/optimizer.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,11 @@ class Optimizer(torch.optim.Optimizer):
5656
5757
Unlike regular training, your device may join midway through training, when other peers already made some progress.
5858
For this reason, any learning rate schedulers, curriculum and other **time-dependent features should be based on**
59-
``optimizer.local_epoch`` (and not the number ot calls to opt.step). Otherwise, peers that joined training late
59+
``optimizer.local_epoch`` (and not the number of calls to opt.step). Otherwise, peers that joined training late
6060
may end up having different learning rates. To do so automatically, specify ``scheduler=...`` parameter below.
6161
6262
:What is an epoch?: Optimizer uses the term ``epoch`` to describe intervals between synchronizations. One epoch
63-
coresponds to processing certain number of training samples (``target_batch_size``) in total across all peers.
63+
corresponds to processing certain number of training samples (``target_batch_size``) in total across all peers.
6464
Like in PyTorch LR Scheduler, **epoch does not necessarily correspond to a full pass over the training data.**
6565
At the end of epoch, peers perform synchronous actions such as averaging gradients for a global optimizer update,
6666
updating the learning rate scheduler or simply averaging parameters (if using local updates).

‎hivemind/optim/power_sgd_averager.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ class PowerSGDGradientAverager(GradientAverager):
5151
5252
:param parameters: pytorch parameters for which to aggregate gradients
5353
:param averager_rank: rank of compressed gradients
54-
:param dht: a DHT isntance connected to the rest of the swarm. See hivemind.DHT docs
54+
:param dht: a DHT instance connected to the rest of the swarm. See hivemind.DHT docs
5555
:param prefix: a unique DHT key used for matchmaking. E.g. this can be your experiment name with optional suffixes
5656
:param reuse_grad_buffers: if True, use model's .grad buffers for accumulating gradients over multiple steps.
5757
This is more memory efficient, but it requires that the user does *not* call zero_grad or clip_by_whatever at all

‎hivemind/utils/math.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ def orthogonalize_(matrix, eps: float = 1e-8):
1515

1616

1717
def get_flatten_greedy_dims(tensor: torch.Tensor, max_ndim: int = 2):
18-
"""get dims to flatten tensor upto max_ndim dimensions by merging small axes together"""
18+
"""get dims to flatten tensor up to max_ndim dimensions by merging small axes together"""
1919
dims = list(tensor.shape)
2020
while len(dims) > max_ndim:
2121
squeeze_ix = min(range(len(dims) - 1), key=lambda i: dims[i] * dims[i + 1])

‎requirements-dev.txt

+1
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,5 @@ scikit-learn
88
torchvision
99
black==22.3.0
1010
isort==5.10.1
11+
codespell==2.2.2
1112
psutil

‎tests/test_averaging.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -356,7 +356,7 @@ def test_load_state_from_peers():
356356
class TestAverager(DecentralizedAverager):
357357
def get_current_state(self):
358358
"""
359-
Get current state and send it to a peer. executed in the host process. Meant to be overriden.
359+
Get current state and send it to a peer. executed in the host process. Meant to be overridden.
360360
:returns: a tuple of (serializable_small_metadata, sequence of torch tensors)
361361
"""
362362
nonlocal num_calls, super_metadata, super_tensors

0 commit comments

Comments
 (0)
Please sign in to comment.