Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse?

Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it?

Can they understand layout and visual cues with a VLM or multimodality?

Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM?

 help





Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: