Skip to content

Conversation

@lucidrains
Copy link

clean PR

@lucidrains lucidrains changed the title Sha attn Single head attention with differential LR Oct 7, 2021
@lucidrains lucidrains changed the title Single head attention with differential LR Single head attention with decoupled LR Oct 8, 2021
sha_sandwich_norm = true

[aux_decoder]
loss_weight = 0.25
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set this to 0 to turn off auxiliary AR loss

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocol should be to start off with 0.25 and search for higher values up to 1. if you see continued improvement

ff_dropout = 0.1
num_attn_heads = 1

use_isab_attn = true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when using ISAB attention, num_attn_heads above should be set to at least 4

use_isab_attn = true
isab_num_latents = 6

weight_tie_attn_blocks = false
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for parameter saving when using ISAB blocks, which has twice the number of attention parameters than S(M)HA blocks

num_attn_heads = 1 # number of attention heads, which should be kept at 1 for single-head attention, but can be increased to > 1 to turn on multi-head attention
dim_attn_head = 64 # dimension per attention head, should just keep at 64, but can be lowered to 32 for further efficiency / perf tradeoff

use_isab_attn = false # whether to use ISAB attention (induced-set attention block from the Set Transformers paper)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you were to set this to true, the number of attention heads need to be increased to 4 or above. a good starting config would be

num_attn_heads = 4
dim_attn_head = 64
use_isab_attn = true
isab_num_latents = 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants