Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm reminded of the infamous vending machine benchmark:

https://arxiv.org/abs/2502.15840

in which some LLMs simply failed to manage the basic task of keeping a vending machine stocked with products to sell, and some went completely off the rails, e.g. by threatening their (simulated) supplier with "ABSOLUTE FINAL ULTIMATE TOTAL NUCLEAR LEGAL INTERVENTION".



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: