For a specific bad thing like "rm -rf" that may be plausible, but this will break down when you try to enumerate all the other bad things it could possibly do.
We can, but if you want to stop private info from being leaked then your only sure choice is to stop the agent from communicating with the outside world entirely, or not give it any private info to begin with.
And? If your LLM is controlling user-mode software, you can still easily capture and audit everything from the kernel's perspective. Sandboxing, event tracing, etc...
No need to "ask" for "proof". You can monitor the system in real-time and detect malicious or potentially harmful activity and stop it early. The same tools and methodologies used by security tools for decades...
True, but we can easily validate that regardless of what’s happening inside the conversation - things like «rm -rf» aren’t being executed.