This is more of an off-topic, but is there research into not having to evaluate all LLM tokens for each output token (at perhaps some cost to output quality), thereby making it possible to run these models in a more compute and memory efficient manner?