mrbbot's comments

mrbbot · on Dec 28, 2021

Slshx is a library for building strongly-typed Discord commands (https://discord.com/developers/docs/interactions/application...) that run on Cloudflare Workers (https://workers.cloudflare.com/), using a React-inspired syntax (hooks and JSX). It supports all Discord command types/options, autocomplete and interactive message components. During development, it automatically deploys your commands whenever you change your code.

I created this because I think Cloudflare Workers are a great fit for hosting Discord commands, but there wasn't an easy way to get started that had a fun development experience. I also wanted to see what a Miniflare-first (https://github.com/cloudflare/miniflare) library could look like.

mrbbot · on July 6, 2021

Miniflare author here. Miniflare uses the Node VM module (https://nodejs.org/api/vm.html#vm_class_vm_script) which is slightly higher level than V8 isolates used by Cloudflare Workers, but the same underlying engine.

Without low level isolate support in Node, it would be difficult to emulate CPU time limitations. Miniflare will report how long requests take, but this includes I/O time.

derefr · on July 6, 2021

I think the hard limit CF imposes is of wall-clock time, not CPU time specifically. The CPU-time limit is a soft limit (i.e. not something that kills your Worker; only something used after-the-fact to determine whether your Worker “gets to succeed”.)

From their docs:

> Cloudflare will bill for Duration charges based on the higher of your wall time or CPU time, with a multiple applied to the CPU time to account for the processing power allotted to your script. We will not bill for wall time Duration charges beyond the execution limit given.

I think what this means in practice (as someone who has tried to implement similar “user workload total-resource-spend limiting” before) is that CloudFlare

1. Start a wall-clock timer, and a CPU-accounting sampler, when the task begins;

2. let the workload run until either it completes, or the timer goes off (and if the timer goes off, the task is hard-killed);

3. If the task was hard-killed, the load-balancing layer is then responsible for responding with a 503 (or whatever error CF uses for this case);

4. If the task wasn’t hard-killed, CF calculate CPU-seconds spent as the area under the curve of CPU-usage * wall-clock-time, and check whether the workload exceeded its CPU budget;

5. If the workload did exceed its budget, then — even though the request was calculated successfully — they nevertheless toss the result away, and reply with a 503 (or whatever) from the Workers control-plane layer;

6. If it didn’t, then they actually forward the result of your request back to the user.

(Meanwhile, handling memory limits is a lot easier — they can just ride the coattails of V8’s own per-ExecutionContext memory accounting, to trigger an OOM event on allocation when area-under-the-curve of GB-secs goes over-limit. But, as a workload might do all its allocations at the beginning and then exceed the memory GB-seconds limit due to the time component increasing, you need to do one final calculation of this at the same time you’re doing the final CPU-accounting check.)

kentonv · on July 6, 2021

> I think the hard limit CF imposes is of wall-clock time, not CPU time specifically.

No, it's the other way around. The enforced limit is strictly on CPU time, not wall time. We use Linux's timer_create() with CLOCK_THREAD_CPUTIME_ID to set a timer that delivers a signal when a CPU time threshold is reached. The signal handler immediately terminates execution of JavaScript using V8's TerminateExecution() (which only terminates the specific isolate).

It sounds like your experience is from a system where each guest runs in their own process. Cloudflare Workers runs many guest isolates in a single process, therefore we cannot simply kill the process when one guest misbehaves. So, we have to do everything very differently from what a container host would do.

The line you quoted from the docs is about billing. The enforcement of limits, and the calculation of billing, are completely unrelated.

Here are some reference links if you want to understand more about how our platform is implemented:

https://www.infoq.com/presentations/cloudflare-v8/

https://blog.cloudflare.com/mitigating-spectre-and-other-sec...

tiffanyh · on July 6, 2021

> "Cloudflare Workers runs many guest isolates in a single process, therefore we cannot simply kill the process when one guest misbehaves. So, we have to do everything very differently from what a container host would do."

Does this imply that another guest Worker can impact/takedown my Worker due to their tasks misbehaving?

kentonv · on July 6, 2021

> Does this imply that another guest Worker can impact/takedown my Worker due to their tasks misbehaving?

No, the system is designed to prevent that. Each guest runs in its own V8 isolate which we carefully control to prevent interference.

(Of course, all software has bugs from time to time. But we consider it a security flaw if one worker can somehow disrupt other workers, and handle it accordingly.)

tiffanyh · on July 6, 2021

Awesome. Thanks so much.

Any considersation to expanding the Worker use case to allow for more deployment of full-blown web apps to the edge?

kentonv · on July 6, 2021

Of course! Many people are deploying full-blown web apps on Workers today. We're constantly working on new features to fill in gaps so more types of apps can be built entirely on Workers. Durable Objects, cron triggers, increasing the time limit from 50ms to 30 seconds (and eventually 15 minutes), etc. are all part of this, and we have more coming.

tiffanyh · on July 6, 2021

Any talk of a SQL database offering? The KV is awesome but there are times when a relational database is a way better fit.

E.g. I'd love to be able to deploy a full blown Elixir/Phoenix/Postgres app to Cloudflare Workers.

nathancahill · on July 6, 2021

That could be the case. I ran in to this trying to rescale PNGs in pure JS and hit CPU time limits immediately with lots of wall-clock time to spare. In practice it doesn't matter too much if the CPU limit kills the task or not, we're talking 50ms at most.

mrkurt · on July 6, 2021

Isolated VM is a decent way to use lower level v8, but replicating "fetch" and the other CF workers primitives is a big chunk of work: https://github.com/laverdet/isolated-vm

Deno is the best OSS out there for "faking" edge workers: https://deno.land/manual@v1.4.6/runtime/workers

junon · on July 6, 2021

vm2 does what you're looking for IIRC. Do a dynamic import check with an optionalDependency and write some glue over vm2 with a fallback to the built-in.

Just a thought.