Bunki, a C Coroutine Library

samsquire · on March 13, 2023

Thanks for this.

There is a really good blog post to understand coroutines from an assembly perspective here: https://blog.dziban.net/coroutines/

I ported the intel assembly syntax in that blog post to at&t syntax and assembled it with GNU Assembler https://github.com/samsquire/assembly as coroutines.S

there is also protothreads and Tina http://dunkels.com/adam/pt/ https://github.com/slembcke/Tina

anfilt · on March 13, 2023

No problem. If you have any questions feel free to ask.

qprofyeh · on March 13, 2023

My C is a bit rusty. What do (void*)0xcafe and (void*)0xbeef in the example mean?

anfilt · on March 13, 2023

The example just shows some of the functions for the library being used. Mainly the storing and returning of values. As for the numbers like "(void*)0xcafe" it's casting the number 0xcafe (in hex) to a void*. Why those constants they just easy to read in hex since they happen to spell a word. They could be anything, depending on your application like a pointer to data or some important constant ect...

qprofyeh · on March 13, 2023

Thanks, so void* is the any type in C. And the argument is not necessarily a pointer nor should it be dereferenced. I think I understand.

Warwolt · on March 13, 2023

More particularly, I would describe it as a "pointer to anything". The size of the value will be exactly the size used for pointers in the given system (e.g. 8 bytes).

If you happen to know what a void* is pointing to, you can cast it to another type like "my_struct_t*" and deference the value. I would call it a work around for C lacking true polymorphism.

quietbritishjim · on March 15, 2023

That is pretty much the opposite of what GP said: they're using void* to directly store an integer value rather than a pointer to anything (in that case).

delfinom · on March 13, 2023

Should be saving rdi + 0x0020 on windows to save the current fiber storage pointer on jump.

Certain win32 api calls actually use fibers deep down even if you don't use them. You will get a crash after bouncing around. I vaguely remember(or think) schannel is one such set of win32 functions that'll cause it.

anfilt · on March 13, 2023

I will take a look at this if that's, if I need to save the fiber_data field from the TEB on windows easy enough to add.

iainmerrick · on March 13, 2023

The name is Japanese word bunki (分岐) which means to branch off. I consider the name quite fitting for a coroutine library just google image (分岐) and you will see what I mean.

Good name! That's very nice.

anfilt · on March 13, 2023

Thanks, I am definitely happy with the name.

dagurp · on March 13, 2023

Bunki in Icelandic means batch or stack.

anfilt · on March 13, 2023

Lol even better it since it is a stackful co-routine library. =)

actionfromafar · on March 13, 2023

I bet it's related to "bunch".

dagurp · on March 14, 2023

It wouldn't surprise me but it's used in a slightly different way.

We also have "búnt" which is used for a bunch of flowers

froh · on March 13, 2023

to save you a click: the image search gives you railway switches. sophisticated ones.

stefanos82 · on March 13, 2023

How does it compare with https://github.com/edubart/minicoro ?

anfilt · on March 13, 2023

Anything in particular your curious about?

For a few similar things:

* It can Yield from anywhere

* You can push data on the stack when creating a co-routine

* It has a slot for local storage in the coroutine (the library you linked calls it user_data).

* It should be pretty light weight since it uses assembly and saves as little state as possible based off the calling the conventions.

There are differences though. One big difference is I do have functions that let you call functions that generate deep call stacks on the thread stack.

bonzini · on March 13, 2023

Don't use stackful coroutines. Compilers don't understand it and you will sooner or later get miscompilations or bugs, particularly if your code uses thread-local storage (which doesn't take a lot of effort, for example "errno" is a thread-local variable).

OskarS · on March 13, 2023

Do you have any specific reason to say why it will cause miscompilation? Anything in particular that makes this UB? My understanding is that as long as your functions compile to the correct ABI, it shouldn't be a problem. All you're doing is setting different stack/frame pointers before calling the function.

As for the errno thing: the whole point about coroutines is that they are cooperatively scheduled, they don't yield unless you specifically tell them to. As long as you don't insert a yield-point in between the function that sets errno and then reading the value (and you should also always read errno immediately anyway), I don't see what the problem is (except errno being a terrible API in general, but that ship has pretty much sailed).

bonzini · on March 14, 2023

Yes, I do---we experienced it in QEMU and the workaround is hideous (https://patchew.org/QEMU/20211201170120.286139-1-stefanha@re...). What happens is that errno can be compiled to

    address_of_errno = &errno;
    ...
    foo = *address_of_errno;

The compiler can assume that the address stays the same across the whole execution of the function and cache it in a register. Unfortunately, if you can change threads across yield points, you can end up reading the errno of the wrong function.

Instead, C++ stackless coroutines explicitly promise that you will read the TLS of the currently executing thread in this case.

This is what we found, but I cannot rule out that this is the only possibility.

OskarS · on March 14, 2023

That's really interesting, I hadn't considered the case of the compiler caching the address of errno. Obviously it can't cache the value across a function call (with or without a yield point), but caching the address does make sense, and I see how that would blow up if the coroutine was scheduled on a different thread.

I'd argue that the problem here is that you should obviously not use TLS with corutines, since they can be scheduled on any thread. The concept of thread-local storage just doesn't make much sense for coroutines. The issue isn't necessarily that there's anything wrong with coroutines, it's rather with APIs that use TLS as an information side-channel. errno is just a bad idea, basically.

But point taken.

bonzini · on March 14, 2023

Using TLS in coroutines may not make much sense, but using TLS in the runtime can be unavoidable. For example, the stack of running coroutines is per-thread. You can perhaps avoid that with a more complex API, but placing it in TLS is the obvious design.

Unfortunately, this has happened and is not hypothetical, but at least for now we're stuck in the stackful hole we dug for ourselves. :( A few years back we had similar issues on Windows.

anfilt · on March 13, 2023

There are lots of things that happen when a program runs that a compiler is unaware of. A compilers don't run during runtime... So I am confused what you mean by mis-compilations.

As for runtime this is why people follow an ABI. Otherwise lots of things like dynamic linking ect... would not work.

bonzini · on March 14, 2023

See here: https://news.ycombinator.com/item?id=35149189

QEMU uses stackful coroutines, and it's a great programming model but we ended up fighting the compiler more than once. You can build a more traditional enter/yield API on top of C++ coroutines, see for example https://lore.kernel.org/qemu-devel/YjMuyMcwG09Tohyh@redhat.c....

ranger_danger · on March 13, 2023

how does this compare with byuu's libco?

anfilt · on March 14, 2023

Went to look to look for this library. Looks like the website for that project is gone: https://byuu.org/library/libco/

So unless there is an authoritative mirror somewhere I can't tell you.

bonzini · on March 14, 2023

https://github.com/creationix/libco

ranger_danger · on March 21, 2023

that one is old and missing aarch64. here's the latest https://github.com/ares-emulator/ares/tree/master/libco

anfilt · on March 27, 2023

So looking at that. I will say if you look at the API exposed in "libco.h" it's does not offer a ton options other than context switching

cothread_t co_active(void); cothread_t co_derive(void, unsigned int, void ()(void)); cothread_t co_create(unsigned int, void (*)(void)); void co_delete(cothread_t); void co_switch(cothread_t); int co_serializable(void);

Also it uses malloc to create the stacks. My library lets decide how wish to allocate stacks with. On linux I recommend mmap since you can create guard pages and such, but it's completely up to. Also I have functions that can let you call an other function on thread stack. This is handy to let you call a function that generates a deep call stack. Otherwise you have to mess around hopping back and forth between couroutines and the main thread context.