yeah even the official flashattention is moving many implementations from cutlas...

jart · on March 16, 2024

It was written with cutlass? No wonder Peter Kim found it valuable and worthwhile to de-obfuscate. Adopting a new programming language invented by OpenAI doesn't sound like a much better alternative. I'd be shocked if either of them were able to build code for AMD GPUs, where it's easy to adapt CUDA code, but not if it's buried in tens of thousands of lines of frameworks. I like open source code to have clarity so I can optimize it for my own production environment myself. When people distribute code they've productionized for themselves, it squeezes out all the alpha and informational value. Just because something's open source doesn't mean it's open source. I think people mostly do it to lick the cookie without giving much away.

synquid · on March 17, 2024

Triton has an AMD backend, although work is still ongoing.

imtringued · on March 17, 2024

You will also be able to use Triton to target Ryzen AI.