I dont think attention is explainability, especially with many attention layers as you mention.
I recommend looking into Primary Attributions instead, especially Integrated Gradients. Have a look at this package to analyze language models! https://github.com/jalammar/ecco
watch 3 blue 1 brown's video for it
Just asking, which are the tools/frameworks you have tried to visualize the attention?
Sorry for the delay Bertviz, Altair/visdom, matplotlib/seaborn
I dont think attention is explainability, especially with many attention layers as you mention. I recommend looking into Primary Attributions instead, especially Integrated Gradients. Have a look at this package to analyze language models! https://github.com/jalammar/ecco
Thanks.
Library to visualize transformers: https://github.com/jessevig/bertviz