Btw, shouldn't it in theory be possible to run the Mixtral MoE loading next subm... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		TotalCrackpot on Dec 18, 2023 \| parent \| context \| favorite \| on: Small offline large language model – TinyChatEngin... Btw, shouldn't it in theory be possible to run the Mixtral MoE loading next submodel sequentially and store outputs and then do the rest of the algorithm to make it easier to run on machines that cannot fit whole model in the memory?

wfhpw on Dec 18, 2023 [–]

Yes but loading weights into memory takes time

TotalCrackpot on Dec 18, 2023 | [–]

Yeah I imagine sequential inference would be slower. How long do you have to wait to load these weights on a personal PC? I have not tried using those systems so far.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact