More

fhuici · on June 11, 2024

Current serverless offerings come with a substantial litany of complex, frustrating and expensive issues that together constitute a far cry from the promise of serverless. In this post we look into why that is, what serverless should be, and how to get there.

fhuici · on May 24, 2024

On kraft.cloud we use Dockeffiles to build into extremely specialized VMs for deployment. With this in place, we can have say an nginx server cold started and ready to serve at a public URL in about 20 millis (not quite the 10ms you mention, but in the right ballpark, and we're constantly shaving that down). Heavier apps can take longer of course, but not too much (e.g., node/next < 100ms). Autoscale and scale to zero also operate in those timescales.

Underneath, we use specialized VMs (unikernels), a custom controller and load balancer, as well as a number of perf tweaks to achieve this. But it's (now) certainly possible.

necovek · on May 24, 2024

Thanks, that is very interesting.

Still, that mostly confirms my experience: to achieve this level of performance, you need to do optimizations on a lower level, and this is not really achievable with docker out of the box (plain Linux host with usual Docker runtime).

fhuici · on May 24, 2024

We don't need all of those layers and abstractions of course. But if we do things right we also don't need to go the bare metal server route -- cloud platforms, if done right, can provide both strong, hardware-level (read: vm) isolation plus fast starts.

On kraft.cloud (shameless plug) we build extremely specialized VMs (aka unikernels) where most of the code in them is the application code, and pair this with a fast, custom controller and other perf tweaks. We use Dockerfiles to build from, but when deploying we eliminate all of those layers you mention. Cold boot times are in milliseconds (e.g., nginx 20ms, a basic node app ~50ms), as are scale to zero and autoscale.

fhuici · on May 24, 2024

Actually it means both, in an unfortunate case of term overload. Though I can understand the embedded/IoT world being frustrated by this, as the term existed first within that context.

mike_d · on May 26, 2024

Both what? There is no definition other than running a workload "at the edge" near the requestor.

This is like the owner of a restaurant with two locations in the same city calling themselves a nationwide chain. It is just a flat out fabrication.

fhuici · on May 24, 2024

The edge within this context means running a server close, in terms of Internet latency, to users. For example, if a user if sending a request from Germany, then the response should come from a server running in say Frankfurt, not the US. There are now many providers that allow devels to deploy services at many different locations at once, and to ensure that client requests are routed to the closest available location. An understandable source of confusion is that wasm comes from the browser world, but it's also possible to run it as standalone (no browser) server code.

Also not to be confused with the term edge within the context of IoT/embedded, where the edge is devices running at the very edge of the Internet, e.g., factory floors, trucks, etc.

fhuici · on May 24, 2024

Yes, though I'd like to point out that "scale to zero" is a loose definition to mean anything that can be transparently scaled to 0 whenever an app/service is idle, and then wake up when traffic to the service arrives once again.

The problem in practice with Cloud Run (and similar products from other providers) is that it can take seconds or minutes for the platform to detect idleness, during which you're still paying, and then seconds to wake up -- during which users/clients have to wait for a response or possibly leave the service/site.

For my taste, real scale to 0 would be: detection and scale to 0 within < 1 second of a idleness, and wakeup within an RTT, such that the mechanism is transparent to end users.

As a shameless plug, this is what we do at kraft.cloud (based on years or research, LF OSS work, unikernels, a custom controller and overall non-negligible engineering effort).

fhuici · on May 24, 2024

100% agree.

In almost all cloud deployment, whether transparently or not, you'll have a hypervisor/VM underneath for hardware-level/strong isolation reasons. Using wasm on top of that stack only for isolation purposes might not be the best use of it. Having said that, if wasm is useful for other reasons (e.g., you need to run wasm blobs on behalf of your users/customers), then my (admittedly biased) view is that you should run these in an extremely specialized VM, that has the ability to run the blob and little else.

If you do this, it is entirely possible to have a VM that can run wasm and still only consume a few MBs and cold start/scale to 0 in milliseconds. On kraft.cloud we do this (eg, https://docs.kraft.cloud/guides/wazero/ , wazero, 20ms cold start).

fhuici · on May 24, 2024

On kraft.cloud we can (done internal stress tests for this) run thousands of specialized VMs (aka unikernels) scaled to zero, meaning that when a request for one of them arrives we can wake it up and respond within the timescales of an RTT. You can take it out for a spin, just use the -0 flag when deploying to do scale to 0 (https://docs.kraft.cloud/guides/features/scaletozero/).

lxgr · on May 24, 2024

Interesting – are we talking actual Linux VMs here, with binary-compatible syscalls etc., or something that applications need to be specifically built or packaged for in some way?

fhuici · on May 24, 2024

Hi, on kraft.cloud we use FC, along with a custom controller and very specialized VMs (unikernels) to have extremely efficient deployments (eg, millisecond cold starts). For a PHP web server, for instance, we can cold start things in about 30ms (https://docs.kraft.cloud/guides/php/). It's also possible to run wasm workloads/blobs (e.g., https://docs.kraft.cloud/guides/wazero/).

The builds are based on Dockerfiles, but for deployment we transparently convert that to unikernels.

fhuici · on May 24, 2024

Fully agree, doing reactive autoscaling when the actual boot time is slow is an inherently hard problem. We've done years of research into building specialized VMs (unikernels) and fast controllers to be able to provide infra that allows VMs/containers to cold start, and thus autoscale/scale to zero in milliseconds (eg, a simple Node app cold starts in ~50 ms). If interested, you can try it out at kraft.cloud, or check out info about the tech in our blogs (https://unikraft.io/blog/) or the corresponding LF OSS project (www.unikraft.org).

deivid · on May 24, 2024

Unikraft is really cool, but Linux is not necessarily the blocker. You can boot to PID1 in firecracker in ~6ms, see my experiments: https://blog.davidv.dev/minimizing-linux-boot-times.html