Attention mask unused? #5

codeninja · 2020-11-14T19:43:30Z

        ass_mask=torch.ones(q_size2*q_size1,1,1,q_size0).cuda()  #[31*128,1,1,11]
        x, self.attn_asset = attention(ass_query, ass_key, ass_value, mask=None, 
                             dropout=self.dropout)

Within MultiHeadedAttention the ass_mask is not being passed into the attention method here and appears as if it's unused. IIUC the attention mask is necessary to prevent look ahead bias in the attention mechanism and should be masking off future values when calculating attention.

If this mask is unused, what was it's intent? Where is attention being masked? And how should that be appied?

The text was updated successfully, but these errors were encountered:

Ivsxk · 2020-11-25T02:44:12Z

This mask is indeed not used since we make portfolio decision15 minutes ahead, while is not a long sequence prediction like the translation. This mask is just a not mature attempt at providing a long term strategy，you can just ignore it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention mask unused? #5

Attention mask unused? #5

codeninja commented Nov 14, 2020

Ivsxk commented Nov 25, 2020

Attention mask unused? #5

Attention mask unused? #5

Comments

codeninja commented Nov 14, 2020

Ivsxk commented Nov 25, 2020