Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If 7 second video consumed 1k token, I'd assume the budget must be insane to process such prompt.


Yeah not feasible with todays methods and rag / lora shenanigans, but the way the field is moving i wouldn't be surprised if new decoder paradigms made it possible.

Saw this yesterday, 1M context window but haven't had any time to look into it, just an example new developments happening every week:

https://www.reddit.com/r/LocalLLaMA/comments/1as36v9/anyone_...


That's a 7 second video from an HD camera. When recording a screen, you only really need to consider whats changing on the screen.


That’s not true. What content is important context on the screen might change dependent on the new changes.


The point is you can do massive compression. It’s more like a sequence of sparse images than video.


Unlikely to be a prompt. It would need to be some form of fine tuning like LORA.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: