You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/models/layers.md
+10-1Lines changed: 10 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ The `Dense` exemplifies several features:
10
10
11
11
* It take an `init` keyword, which accepts a function acting like `rand`. That is, `init(2,3,4)` should create an array of this size. Flux has [many such functions](@ref man-init-funcs) built-in. All make a CPU array, moved later with [`gpu`](@ref Flux.gpu) if desired.
12
12
13
-
* The bias vector is always intialised[`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero.
13
+
* The bias vector is always initialised[`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero.
14
14
15
15
* It is annotated with [`@functor`](@ref Functors.@functor), which means that [`params`](@ref Flux.params) will see the contents, and [`gpu`](@ref Flux.gpu) will move their arrays to the GPU.
16
16
@@ -54,6 +54,15 @@ SamePad
54
54
Flux.flatten
55
55
```
56
56
57
+
## MultiHeadAttention
58
+
59
+
The basic blocks needed to implement [Transformer](https://arxiv.org/abs/1706.03762) architectures. See also the functional counterparts
60
+
documented in NNlib's [Attention](@ref) section.
61
+
62
+
```@docs
63
+
MultiHeadAttention
64
+
```
65
+
57
66
### Pooling
58
67
59
68
These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters.
Copy file name to clipboardExpand all lines: docs/src/models/nnlib.md
+18-6Lines changed: 18 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,20 @@
2
2
3
3
Flux re-exports all of the functions exported by the [NNlib](https://github.com/FluxML/NNlib.jl) package. This includes activation functions, described on [their own page](@ref man-activations). Many of the functions on this page exist primarily as the internal implementation of Flux layer, but can also be used independently.
4
4
5
+
6
+
## Attention
7
+
8
+
Primitives for the [`MultiHeadAttention`](ref) layer.
`Flux`'s `AdaptiveMaxPool`, `AdaptiveMeanPool`, `GlobalMaxPool`, `GlobalMeanPool`, `MaxPool`, and `MeanPool` use `NNlib.PoolDims`, `NNlib.maxpool`, and `NNlib.meanpool` as their backend.
[`MaxPool`](@ref), and [`MeanPool`](@ref) use [`NNlib.PoolDims`](@ref), [`NNlib.maxpool`](@ref), and [`NNlib.meanpool`](@ref) as their backend.
17
29
18
30
```@docs
19
31
PoolDims
@@ -32,7 +44,7 @@ pad_zeros
32
44
33
45
## Convolution
34
46
35
-
`Flux`'s `Conv` and `CrossCor` layers use `NNlib.DenseConvDims` and `NNlib.conv` internally.
47
+
`Flux`'s [`Conv`](@ref) and [`CrossCor`](@ref) layers use [`NNlib.DenseConvDims`](@ref) and [`NNlib.conv`](@ref) internally.
36
48
37
49
```@docs
38
50
conv
@@ -44,7 +56,7 @@ DenseConvDims
44
56
45
57
## Upsampling
46
58
47
-
`Flux`'s `Upsample` layer uses `NNlib.upsample_nearest`, `NNlib.upsample_bilinear`, and `NNlib.upsample_trilinear` as its backend. Additionally, `Flux`'s `PixelShuffle` layer uses `NNlib.pixel_shuffle` as its backend.
59
+
`Flux`'s [`Upsample`](@ref) layer uses [`NNlib.upsample_nearest`](@ref), [`NNlib.upsample_bilinear`](@ref), and [`NNlib.upsample_trilinear`](@ref) as its backend. Additionally, `Flux`'s [`PixelShuffle`](@ref) layer uses [`NNlib.pixel_shuffle`](@ref) as its backend.
0 commit comments