singular value spectrum
top right singular vectors (heatmap)
cross-layer flow: U_L vs V_(L+1) of ffn_down_up (cosine similarity)
rows = 64 left-singular vectors of layer L's composed dense FFN ("what L said");
cols = 64 right-singular vectors of layer L+1's composed dense FFN ("what L+1 listens for");
cell intensity = absolute cosine similarity. White diagonal means concept i in L
directly excites concept i in L+1. Diffuse patterns are normal — that's where labelling
becomes interesting.