Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Btw, shouldn't it in theory be possible to run the Mixtral MoE loading next submodel sequentially and store outputs and then do the rest of the algorithm to make it easier to run on machines that cannot fit whole model in the memory?


Yes but loading weights into memory takes time


Yeah I imagine sequential inference would be slower. How long do you have to wait to load these weights on a personal PC? I have not tried using those systems so far.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: