We initially wrote this for a less technical audience (where we spelled out MCP), then edited it to post here - it's not AI, it's just bad editing from my part. Fixed now.
I'm confused by your explanation. You originally spelled out MCP and then edited it. Did you originally have it as model context protocol and then edited it to model control plane? Or did you originally have it spelled out as model control plane and missed it in editing?
The only Beam-specific part are the sandboxes, but those can easily be swapped out for the vendor of your choice. The architecture we described isn't exclusive to our product.
Beam is an ultrafast AI inference platform. We built a serverless runtime that launches GPU-backed containers in less than 1 second and quickly scales out to thousands of GPUs. Developers use our platform to serve apps to millions of users around the globe.
We’re working on challenging problems, including:
* Low-level systems development: working with container runtimes, OCI image formats, and lazy-loading large files from content addressable storage
* Efficiently packing thousands of workloads into GPUs across multiple clouds
* Working with cutting-edge technologies, like GPU checkpoint restore and CRIU
You don’t need prior experience with AI/ML, only an interest in working on hard problems and shipping quickly.
You should look into beam.cloud (I'm the founder, but it's pretty great)
It lets you quickly run long-running jobs on the cloud by adding a simple decorator to your Python code:
from beam import function
# Some long training function
@function(gpu="A100-80")
def handler():
return {}
if __name__ == "__main__":
# Runs on the cloud
handler.remote()
You should checkout beam.cloud (I'm the founder). It's a modern FaaS platform for Python, with support for REST endpoints, task queues, scheduled jobs, and GPU support.
The major clouds don't support serverless GPU because the architecture is fundamentally different from running CPU workloads. For Lambda specifically, there's no way of running multiple customer workloads on a single GPU with Firecracker.
A more general issue is that the workloads that tend to run on GPU are much bigger than a standard Lambda-sized workload (think a 20Gi image with a smorgasbord of ML libraries). I've spent time working around this problem and wrote a bit about it here: https://www.beam.cloud/blog/serverless-platform-guide
beam.cloud | Founding Software Engineer, Infrastructure | Full-time | REMOTE | New York, NY USA
Beam is building a cloud runtime for running remote containers on GPUs. We’re used by thousands of developers for powering their generative AI apps, including companies like Coca-Cola, and we’re backed by great investors like YC and Tiger.
We’re building gnarly low-level distributed systems. You’ll have a major impact on the product and ship features directly to users. If working on a new Pythonic cloud runtime sound exciting, you might really like it here.
beam.cloud | Founding Software Engineer, Infrastructure | Full-time | REMOTE | New York, NY USA
Beam is building a cloud runtime for running remote containers on GPUs. We’re used by thousands of developers for powering their generative AI apps, including companies like Coca-Cola, and we’re backed by great investors like YC and Tiger.
We’re building gnarly low-level distributed systems. You’ll have a major impact on the product and ship features directly to users. If working on a new Pythonic cloud runtime sound exciting, you might really like it here.
There are a number of good options here. The different axes are cost of GPUs, performance, and ease of use / developer experience. You might consider beam.cloud (I'm one of the founders), which is oriented strongly on the performance and developer experience angle.