You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an implementation of relative embedding as described in the
490
+
paper ["DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing"](https://arxiv.org/abs/2111.09543).
491
+
This layer initializes an embedding matrix (of shape
492
+
`(2 * batch_size, hidden_dim)`) for relative position encoding. It then
493
+
applies layer normalization on the embedding matrix and returns the relative
494
+
embedding matrix.
495
+
496
+
Args:
497
+
hidden_dim: int. The size of the dense embedding.
498
+
bucket_size: int. The size of the relative position buckets.
499
+
layer_norm_epsilon: float. Epsilon value to initialize the layer
500
+
normalization layer.
501
+
kernel_initializer: string or `keras.initializers` initializer.
0 commit comments