They inner layer permutability is super interesting. Is that result published an...

bluecoconut · on Feb 25, 2024

Those curves of "embedding displacement" are very interesting!

quickly scanning the blog led to this notebook which shows how they're computed and shows other examples too with similar behavior. https://github.com/spather/transformer-experiments/blob/mast...

bluecoconut · on Feb 25, 2024

I haven't published it nor have I seen it published.

I can copy paste some of my raw notes / outputs from poking around with a small model (Phi-1.5) into a gist though: https://gist.github.com/bluecoconut/6a080bd6dce57046a810787f...