We leverage Flash Decoding (https://crfm.stanford.edu/2023/10/12/flashdecoding.h...

		rushingcreek on Oct 31, 2023 \| parent \| context \| favorite \| on: Phind Model beats GPT-4 at coding, with GPT-3.5 sp... We leverage Flash Decoding (https://crfm.stanford.edu/2023/10/12/flashdecoding.html) in TensorRT-LLM to achieve 100 tokens per second on H100s.