Skip to content

Commit efdcb0d

Browse files
doc entry for MultiHeadAttention (#2218)
* docs for MultiHeadAttention * cleanup * news * Update docs/src/models/nnlib.md Co-authored-by: Saransh Chopra <[email protected]> * Update docs/src/models/nnlib.md Co-authored-by: Saransh Chopra <[email protected]> --------- Co-authored-by: Saransh Chopra <[email protected]>
1 parent 0038a60 commit efdcb0d

File tree

5 files changed

+35
-11
lines changed

5 files changed

+35
-11
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ deps
88
.vscode
99
Manifest.toml
1010
LocalPreferences.toml
11+
.DS_Store

NEWS.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
# Flux Release Notes
22

3+
## v0.13.15
4+
* Added [MultiHeadAttention](https://github.com/FluxML/Flux.jl/pull/2146) layer.
35

46
## v0.13.14
57
* Fixed various deprecation warnings, from `Zygone.@nograd` and `Vararg`.
8+
* Initial support for `AMDGPU` via extension mechanism.
9+
* Add `gpu_backend` preference to select GPU backend using `LocalPreference.toml`.
10+
* Add `Flux.gpu_backend!` method to switch between GPU backends.
611

712
## v0.13.13
813
* Added `f16` which changes precision to `Float16`, recursively.
9-
* Initial support for AMDGPU via extension mechanism.
10-
* Add `gpu_backend` preference to select GPU backend using `LocalPreference.toml`.
11-
* Add `Flux.gpu_backend!` method to switch between GPU backends.
1214

1315
## v0.13.12
1416
* CUDA.jl 4.0 compatibility.

docs/src/models/layers.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The `Dense` exemplifies several features:
1010

1111
* It take an `init` keyword, which accepts a function acting like `rand`. That is, `init(2,3,4)` should create an array of this size. Flux has [many such functions](@ref man-init-funcs) built-in. All make a CPU array, moved later with [`gpu`](@ref Flux.gpu) if desired.
1212

13-
* The bias vector is always intialised [`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero.
13+
* The bias vector is always initialised [`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero.
1414

1515
* It is annotated with [`@functor`](@ref Functors.@functor), which means that [`params`](@ref Flux.params) will see the contents, and [`gpu`](@ref Flux.gpu) will move their arrays to the GPU.
1616

@@ -54,6 +54,15 @@ SamePad
5454
Flux.flatten
5555
```
5656

57+
## MultiHeadAttention
58+
59+
The basic blocks needed to implement [Transformer](https://arxiv.org/abs/1706.03762) architectures. See also the functional counterparts
60+
documented in NNlib's [Attention](@ref) section.
61+
62+
```@docs
63+
MultiHeadAttention
64+
```
65+
5766
### Pooling
5867

5968
These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters.

docs/src/models/nnlib.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,20 @@
22

33
Flux re-exports all of the functions exported by the [NNlib](https://github.com/FluxML/NNlib.jl) package. This includes activation functions, described on [their own page](@ref man-activations). Many of the functions on this page exist primarily as the internal implementation of Flux layer, but can also be used independently.
44

5+
6+
## Attention
7+
8+
Primitives for the [`MultiHeadAttention`](ref) layer.
9+
10+
```@docs
11+
NNlib.dot_product_attention
12+
NNlib.dot_product_attention_scores
13+
NNlib.make_causal_mask
14+
```
15+
516
## Softmax
617

7-
`Flux`'s `logitcrossentropy` uses `NNlib.softmax` internally.
18+
`Flux`'s [`Flux.logitcrossentropy`](@ref) uses [`NNlib.logsoftmax`](@ref) internally.
819

920
```@docs
1021
softmax
@@ -13,7 +24,8 @@ logsoftmax
1324

1425
## Pooling
1526

16-
`Flux`'s `AdaptiveMaxPool`, `AdaptiveMeanPool`, `GlobalMaxPool`, `GlobalMeanPool`, `MaxPool`, and `MeanPool` use `NNlib.PoolDims`, `NNlib.maxpool`, and `NNlib.meanpool` as their backend.
27+
`Flux`'s [`AdaptiveMaxPool`](@ref), [`AdaptiveMeanPool`](@ref), [`GlobalMaxPool`](@ref), [`GlobalMeanPool`](@ref),
28+
[`MaxPool`](@ref), and [`MeanPool`](@ref) use [`NNlib.PoolDims`](@ref), [`NNlib.maxpool`](@ref), and [`NNlib.meanpool`](@ref) as their backend.
1729

1830
```@docs
1931
PoolDims
@@ -32,7 +44,7 @@ pad_zeros
3244

3345
## Convolution
3446

35-
`Flux`'s `Conv` and `CrossCor` layers use `NNlib.DenseConvDims` and `NNlib.conv` internally.
47+
`Flux`'s [`Conv`](@ref) and [`CrossCor`](@ref) layers use [`NNlib.DenseConvDims`](@ref) and [`NNlib.conv`](@ref) internally.
3648

3749
```@docs
3850
conv
@@ -44,7 +56,7 @@ DenseConvDims
4456

4557
## Upsampling
4658

47-
`Flux`'s `Upsample` layer uses `NNlib.upsample_nearest`, `NNlib.upsample_bilinear`, and `NNlib.upsample_trilinear` as its backend. Additionally, `Flux`'s `PixelShuffle` layer uses `NNlib.pixel_shuffle` as its backend.
59+
`Flux`'s [`Upsample`](@ref) layer uses [`NNlib.upsample_nearest`](@ref), [`NNlib.upsample_bilinear`](@ref), and [`NNlib.upsample_trilinear`](@ref) as its backend. Additionally, `Flux`'s [`PixelShuffle`](@ref) layer uses [`NNlib.pixel_shuffle`](@ref) as its backend.
4860

4961
```@docs
5062
upsample_nearest
@@ -60,7 +72,7 @@ pixel_shuffle
6072

6173
## Batched Operations
6274

63-
`Flux`'s `Bilinear` layer uses `NNlib.batched_mul` internally.
75+
`Flux`'s [`Flux.Bilinear`](@ref) layer uses [`NNlib.batched_mul`](@ref) internally.
6476

6577
```@docs
6678
batched_mul
@@ -72,7 +84,7 @@ batched_vec
7284

7385
## Gather and Scatter
7486

75-
`Flux`'s `Embedding` layer uses `NNlib.gather` as its backend.
87+
`Flux`'s [`Embedding`](@ref) layer uses [`NNlib.gather`](@ref) as its backend.
7688

7789
```@docs
7890
NNlib.gather

docs/src/tutorials/2021-10-08-dcgan-mnist.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ We will be using the [relu](https://fluxml.ai/Flux.jl/stable/models/nnlib/#NNlib
101101
We will also apply the weight initialization method mentioned in the original DCGAN paper.
102102

103103
```julia
104-
# Function for intializing the model weights with values
104+
# Function for initializing the model weights with values
105105
# sampled from a Gaussian distribution with μ=0 and σ=0.02
106106
dcgan_init(shape...) = randn(Float32, shape) * 0.02f0
107107
```

0 commit comments

Comments
 (0)