I personally have been looking for "explain it like I'm a CS PhD with lots of experience and the ability to look stuff up". But I suspect your summary would be pretty handy as well.
I reckon you need tacit knowledge. Experience. Luckily in the order
of 100 hours not 10000.
Build a GPT using Python and Pytorch. For a good course: Andrej Karpathy is your keyword. At $1000 his course is great value. But actually it is free which is even better ;-)
It wont take you to flash attention but will ramp you to the point you could probably read papers about it. I almost got that far then life lifed me. But I was able to implement changes to the architecture of GPT and do some “hey mum I am doing SOTA (2021) machine learning”.