what's the difference between "self-attention mechanism" and "full-connection" layer?

Question

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?

hkchengrex · Accepted Answer · 2021-12-02 20:07:48Z

18

Ignoring details like normalization, biases, and such, fully connected networks are fixed-weights:

f(x) = (Wx)

where W is learned in training, and fixed in inference.

Self-attention layers are dynamic, changing the weight as it goes:

attn(x) = (Wx)
f(x) = (attn(x) * x)

Again this is ignoring a lot of details but there are many different implementations for different applications and you should really check a paper for that.

edited Dec 2, 2021 at 20:07

answered Oct 6, 2020 at 3:33

hkchengrex

4,59125 silver badges35 bronze badges

1

i.e. f(x)=((wx)*x) in self-attention. Anyway f(x) is a function of x. So theoretically speaking, multiple FC layers are able to simulate the same behavior of an attention.
– tom_cat
Oct 6, 2020 at 3:47
7

@tom_cat Theoretically speaking, multiple FC can simulate any function.
– hkchengrex
Oct 6, 2020 at 3:50
is it right to say that, to some extent, attention is a special type of FC, whose weights are dynamically and indirectly determined by some other weights @hkchengrex
– tom_cat
Oct 6, 2020 at 5:50
1

@tom_cat It is a matter of interpretation but I wouldn't say that. I would say both FC and self-attention are cases of "connections" with the weights determined by a fixed or input-related scheme.
– hkchengrex
Oct 6, 2020 at 5:54
@hkchengrex Could you please explain what you mean with "dynamic, changing the weight as it goes" in this context?
– Squeezelemma
Mar 4 at 14:17

Add a comment |

Collectives™ on Stack Overflow

what's the difference between "self-attention mechanism" and "full-connection" layer?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
pytorch
bert-language-model
transformer-model
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pytorchbert-language-modeltransformer-model or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
pytorch
bert-language-model
transformer-model
or ask your own question.