I ported the intel assembly syntax in that blog post to at&t syntax and assembled it with GNU Assembler
https://github.com/samsquire/assembly as coroutines.S
The example just shows some of the functions for the library being used. Mainly the storing and returning of values. As for the numbers like "(void*)0xcafe" it's casting the number 0xcafe (in hex) to a void*. Why those constants they just easy to read in hex since they happen to spell a word. They could be anything, depending on your application like a pointer to data or some important constant ect...
More particularly, I would describe it as a "pointer to anything". The size of the value will be exactly the size used for pointers in the given system (e.g. 8 bytes).
If you happen to know what a void* is pointing to, you can cast it to another type like "my_struct_t*" and deference the value. I would call it a work around for C lacking true polymorphism.
That is pretty much the opposite of what GP said: they're using void* to directly store an integer value rather than a pointer to anything (in that case).
Should be saving rdi + 0x0020 on windows to save the current fiber storage pointer on jump.
Certain win32 api calls actually use fibers deep down even if you don't use them. You will get a crash after bouncing around.
I vaguely remember(or think) schannel is one such set of win32 functions that'll cause it.
The name is Japanese word bunki (分岐) which means to branch off. I consider the name quite fitting for a coroutine library just google image (分岐) and you will see what I mean.
* You can push data on the stack when creating a co-routine
* It has a slot for local storage in the coroutine (the library you linked calls it user_data).
* It should be pretty light weight since it uses assembly and saves as little state as possible based off the calling the conventions.
There are differences though. One big difference is I do have functions that let you call functions that generate deep call stacks on the thread stack.
Don't use stackful coroutines. Compilers don't understand it and you will sooner or later get miscompilations or bugs, particularly if your code uses thread-local storage (which doesn't take a lot of effort, for example "errno" is a thread-local variable).
Do you have any specific reason to say why it will cause miscompilation? Anything in particular that makes this UB? My understanding is that as long as your functions compile to the correct ABI, it shouldn't be a problem. All you're doing is setting different stack/frame pointers before calling the function.
As for the errno thing: the whole point about coroutines is that they are cooperatively scheduled, they don't yield unless you specifically tell them to. As long as you don't insert a yield-point in between the function that sets errno and then reading the value (and you should also always read errno immediately anyway), I don't see what the problem is (except errno being a terrible API in general, but that ship has pretty much sailed).
The compiler can assume that the address stays the same across the whole execution of the function and cache it in a register. Unfortunately, if you can change threads across yield points, you can end up reading the errno of the wrong function.
Instead, C++ stackless coroutines explicitly promise that you will read the TLS of the currently executing thread in this case.
This is what we found, but I cannot rule out that this is the only possibility.
That's really interesting, I hadn't considered the case of the compiler caching the address of errno. Obviously it can't cache the value across a function call (with or without a yield point), but caching the address does make sense, and I see how that would blow up if the coroutine was scheduled on a different thread.
I'd argue that the problem here is that you should obviously not use TLS with corutines, since they can be scheduled on any thread. The concept of thread-local storage just doesn't make much sense for coroutines. The issue isn't necessarily that there's anything wrong with coroutines, it's rather with APIs that use TLS as an information side-channel. errno is just a bad idea, basically.
Using TLS in coroutines may not make much sense, but using TLS in the runtime can be unavoidable. For example, the stack of running coroutines is per-thread. You can perhaps avoid that with a more complex API, but placing it in TLS is the obvious design.
Unfortunately, this has happened and is not hypothetical, but at least for now we're stuck in the stackful hole we dug for ourselves. :( A few years back we had similar issues on Windows.
There are lots of things that happen when a program runs that a compiler is unaware of. A compilers don't run during runtime... So I am confused what you mean by mis-compilations.
As for runtime this is why people follow an ABI. Otherwise lots of things like dynamic linking ect... would not work.
QEMU uses stackful coroutines, and it's a great programming model but we ended up fighting the compiler more than once. You can build a more traditional enter/yield API on top of C++ coroutines, see for example https://lore.kernel.org/qemu-devel/YjMuyMcwG09Tohyh@redhat.c....
Also it uses malloc to create the stacks. My library lets decide how wish to allocate stacks with. On linux I recommend mmap since you can create guard pages and such, but it's completely up to. Also I have functions that can let you call an other function on thread stack. This is handy to let you call a function that generates a deep call stack. Otherwise you have to mess around hopping back and forth between couroutines and the main thread context.
There is a really good blog post to understand coroutines from an assembly perspective here: https://blog.dziban.net/coroutines/
I ported the intel assembly syntax in that blog post to at&t syntax and assembled it with GNU Assembler https://github.com/samsquire/assembly as coroutines.S
there is also protothreads and Tina http://dunkels.com/adam/pt/ https://github.com/slembcke/Tina