would be interested to see thunderkittens (great name!) tackle the flash attention backwards pass, which is an order of magnitude harder than the forward
good news - we've actually included optimized causal and non-causal versions of the flash attention backwards pass with TK - would love for you to check them out!