I tried to do something similar well over a decade ago during an internal hackathon (the motivation back then being speeding up destructive integration tests). My idea was to have the memory be a file on tmpfs, and simply `cp --reflink` to get a copy-on-write clone. Then you wouldn't need to bother with userfaultfd or slow storage as the kernel would just magically do the right thing.
Unfortunately, the Linux kernel didn't support reflink on tmpfs (and still doesn't), and I'm not genius enough to have been able to implement that within 24 hours. :-)
I still believe it'd be nice to implement reflink for tmpfs, though. It's the perfect interface for copy-on-write forking of VM memory.
Glad to see the approach validated at scale! I hadn't seen your blog posts until they were linked here, going to dig into the userfaultfd path. Would love to chat if you're open to it.
It's important to refresh entropy immediately after clone. Still, there can be code that didn't assume it could be cloned (even though there's always been `fork`, of course). Because of this, we don't live clone across workspaces for unlisted/private sandboxes and limit the use case to dev envs where no secrets are stored.
Oh wow! Unexpected and cool to see this post on Hacker News! Since then we have evolved our VM infra a bit, and I've written two more posts about this.
First, we started cloning VMs using userfaultfd, which allows us to bypass the disk and let children read memory directly from parent VMs [1].
And we also moved to saving memory snapshots compressed. To keep VM boots fast, we need to decompress on the fly as VMs read from the snapshot, so we chunk up snapshots in 4kb-8kb pieces that are zstd compressed [2].
Exactly, the result would've been different if the author would not have disabled caching.
In this case it's because the iframes are loaded/unloaded multiple times, but we also spawn web workers where the same worker is spawned multiple times (for transpiling code in multiple threads, for example). In all those cases we rely on caching so we don't have to download the same worker code more than once.
If you want to be efficient in Amsterdam, you take the bike or public transport. That has been faster than cars even before this change, and now more so.
Yes! But I work on CodeSandbox, so that creates some bias :). We've been working on our own CDE solution, though we've taken a different spin to improve speed and cost.
Our solution is based on Firecracker, which enables us to "pause" (& clone) a VM at any point in time and resume it later exactly where it left of, within 1.5s. This gives the benefit that you won't have to wait for your environment to spin up when you request one, or when you continue working on one after some inactivity.
However, there's another benefit to that: we can now "preload" development environments. Whenever someone opens a pull request (even from local), we create a VM for it in the background. We run the dev server/LSPs/everything you need, and then pause the VM. Now whenever you want to review that pull request, we resume that environment and you can instantly review the code or check the dev server/preview like a deployment preview.
It also reduces cost. We can pause the VM after 5 minutes of inactivity, and when you come back, we'll resume it so it won't feel like the environment was closed at all. In other solutions you either need to keep a server spinning in the background, or increase the "hibernation timeout" to make sure you don't have the cold boot.
The first version we launched used the exact same approach (MAP_PRIVATE). Later on, we bypassed the file system by using shared memory and using userfaultfd because ultimately the NVMe became the bottleneck (https://codesandbox.io/blog/cloning-microvms-using-userfault... and https://codesandbox.io/blog/how-we-scale-our-microvm-infrast...).
reply