Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Porting Rust's Std to Rustix (sunfishcode.online)
224 points by jpgvm on Jan 4, 2022 | hide | past | favorite | 51 comments


As a believer in removing unnecessary layers of abstraction, this looks interesting. Is there a technical reason Rustix doesn’t support Windows, or is that just not a priority at the moment?

(For a bit of background info: It might seem counterintuitive, but on Windows, a “native” application “should” actually call the programming language agnostic Kernel32.dll functions directly. At least that’s the documented, stable way of doing things. Instead, Rust’s std currently goes through libc, which is also fine, but on Windows is an abstraction built on top of the Windows APIs, and is historically a lot less stable. This can cause code bloat and distribution hassles.

This is a bit different from Linux, where the abstractions are layered the other way round: the OS (POSIX) spec mostly assumes an existing libc. As a result, on Linux, going libc-less is a bit harder in my limited experience — though certainly possible if you know what you’re doing.)


> Instead, Rust's std currently goes through libc

Are you sure about this? Rust's std's File::open calls CreateFile on Windows, not libc open. Isn't CreateFile correct Kernel32.dll API to call?

https://github.com/rust-lang/rust/blob/master/library/std/sr...


It is. I haven’t checked the complete source code. What I know for sure is that a program using Rust’s std still requires the C runtime to link and run, and from that perspective, it doesn’t really matter if some or even many parts of the std don’t actually use it.


I see. I wonder where Rust's std is using libc on Windows. I know for a fact filesystem portion of std doesn't call libc at all on Windows.


Rust needs the C runtime for startup/shutdown code (and vcruntime for panics). It also needs basic memory functions such as memcpy and stack probes. There is no way round this except rewriting those in Rust.


Rust already ships with its own memcpy, so it should be possible: https://github.com/rust-lang/compiler-builtins/blob/master/s...


The startup/shutdown code is for the C runtime. If you avoid the C runtime you don't need it.


At a minimum Rust would have to run the C initializers as these are used even in pure Rust code. It might also make use of the security cookie but I'm uncertain about that. Also the C runtime is needed by the SEH handling code (in vcruntime) so Rust would have to replace that before it can replace the C startup/shutdown.

To be clear, this is feasible but it'll require someone knowledgable to put in the work of rewriting this in Rust.


SEH doesn't need the C runtime, they are only required when using the C language extensions for frame-based Exception Handling, with the new vectored Exception Handling, plain Win32 calls will do,

https://docs.microsoft.com/en-us/windows/win32/debug/vectore...

Granted, they are more cumbersome to use and probably not worth the effort to try to avoid them other than special cases.


Sure. I did not mean to imply that SEH itsef requires the C runtime. However, the __CxxFrameHandler3 that Rust/llvm uses would need to be rewritten.

IIRC llvm itself generates calls to __CxxFrameHandler3 when SEH is being used, making it awkward to replace with something that's not compatible.


https://crates.io/crates/r0 exists (not everything you're talking about but like, there's some of this stuff around, in some contexts. We'll see if the whole pile of stuff needed ever gets ported or not, I'm guessing yes but on a long timeframe.)


> This is a bit different from Linux, where the abstractions are layered the other way round: the OS (POSIX) spec mostly assumes an existing libc.

FWIW the POSIX spec assumes libc, but on Linux specifically the syscall interface is stable.


Note that macOS syscall interface is specifically unstable so Rust needs POSIX/libc path anyway.


Also on Solaris/Illumos/SmartOS/..., and on OpenBSD (where it's more or less verboten entirely since system-call-origin verification).


AFAIU, you don't need to use libc on OpenBSD, however only one contiguous region of code is permitted to invoke syscalls, and so you can't mix-and-match ad hoc libraries. See https://man.openbsd.org/msyscall, which implements one-time registration of a contiguous range of pages permitted to invoke syscalls.


Thanks, that’s a great correction/clarification.


Is there a reason "native" and "should" are in scare quotes? I think "should" is pretty well defined - after all, we've got an RFC for it!

Edit: I'm not altogether sure why this is being downvoted, but in case it wasn't clear: this is a sincere question and not some kind of pedantic rhetorical question. (I don't really mind whether people upvote or downvote the comment, except to the extent that it's a signal that I perhaps wasn't clear in what I was asking.)


There isn't anything objectively wrong with using libc as an abstraction on Windows, especially since Win10 ships the "Universal C Runtime" out of the box (earlier you had to deploy a separate crt for each Visual Studio version).

Personally I'm also in favour of cutting out libc and invoking the windows API directly. But others might argue that using it as a shared abstraction that's very similar on Windows, Linux, OSX, BSD, is worth it.

And "native" is ill defined. It's just one more abstraction layer in the libc -> Win32-API -> NT-API -> Kernel chain. Libc isn't comparable to a java runtime or electron, which is what people usually mean when they talk about applications not being "native".


Kind of, it is still a language runtime specially on non-UNIX OSes.

The recent thread about cutting down the startup of C applications on Linux proves the point it isn't a zero cost library.


I didn’t downvote, in fact, I’m a bit amused because the reason I used quotes is exactly because I’m not using any official definition! There’s no consensus definition of “native”, and I’m using “should” purely in the colloquial sense of “something I’d like to see”.


Just a quick reminder, using syscalls without libc is something that will cause trouble with OpenBSD. From [1]: "The eventual goal would be to disallow system calls from anywhere but the region mapped for libc"

[1] https://lwn.net/Articles/806776/


I think Linux is the about the only system where this makes sense. I don't know of another system where the syscall ABI is rigidly defined and stable like it is on Linux. Om most other systems the syscall ABI tends to be "call this C function in that shared library".

However, most computers in the world run Linux nowadays.


While many run the Linux kernel, it isn't always the pure upstream version following the Linus vision.

Many devs targeting embedded and mobile devices are painfully aware of that reality.


However good you think Linux is on this file, the BSDs, particularly OpenBSD, are better.


That is irrelevant in the context of the `linux-raw` backend of this project, which does not target the BSDs.


There is a saying: the best camera is the one you have with you when needed.

For programming - whatever you use and are comfortable with is the best. One does not need to concern what others like / use as long as it does not impede one's own work.


> This project promotes several other goals as well, such as promoting I/O safety concepts and APIs, helping test some of the infrastructure used by cap-std, and helping set the stage for future projects related to sandboxing, WASI, nameless, and other areas.

I'd be interested in hearing more about this. I have a lot of thoughts about sandboxing and Rust, including both build and runtime sandboxing. Would be cool to understand if others are working on this and chat about it. I'm familiar with cap-std, but curious about any other initiatives or where the discussions are happening.


This looks fantastic. In fact, in the best possible way, it looks boring. I'm amazed that this wasn't always how it was implemented. I hope this can be upstreamed into std.


It was mostly to save time. You need libc path to support macOS anyway, and by using libc you can share most code between macOS and Linux (and BSDs). Once that is done, I think Linux-only syscall path was not justified then.

Now I think Rust has enough resource to try this.


Interesting, thanks for the detail! That makes perfect sense as a rationale. I'm glad the Rust team now has enough time to refine these things - thanks for all your work :)


This looks like such a fascinating project. I would love to contribute to this sort of low-level work, but frankly I don't know enough yet. I'm going to have to dig into that repo a bunch.

Anyone have any personal favorite books or articles that would be good for getting up to speed on this kind of thing?


I recommend checking out this issue [1] if you want to get your feet wet. Attempting to build a new Rust program and documenting what prevented it from working is a great way to understand how everything is implemented. Pretty satisfying if you can get a program working, too.

[1] https://github.com/sunfishcode/mustang/issues/22


This is an excellent place to start. Thank you!


For this project, you'd want The Linux Programming Interface. Information here: https://man7.org/tlpi/


I have a copy I grabbed for an OS class. I should take that back out. Much appreciated!


> A path to a Rust on Linux without libc

Such wonderful news! This will enable so many good things, easier cross builds and being able to run Rust code and the ecosystem in more places.

Thank you!


Does this mean we could have a single binary for x86_64 linux instead of two (glibc and musl)?

x86_64-unknown-linux-gnu

x86_64-unknown-linux-musl


No, it means there will be three binaries instead of two.

Note that statically linked musl binaries run fine on glibc Linux, so you can just distribute musl binaries. That works right now. In the future, you will just distribute Rustix binaries, which will be hopefully smaller than musl binaries (because it doesn't need to support C API).


I expect 4 options:

* glibc linked dynamically

* musl linked dynamically

* musl linked statically

* direct syscalls from rust

and possibly additional options choosing if math functions, memcpy, etc. should use an implementation shipped with rust or one shipped with libc.

The big question is if we'll get to a point where almost all rust application don't use a libc at all.


> * musl linked dynamically

> * musl linked statically

This is handled through the orthogonal "crt-static" target-feature.

IIRC each supported CRT has a preference, but for some that can be overridden (musl would be one of those, possibly one of few of those since statically linking glibc is generally recommended against, and so's pretty much every non-linux libc, when that's an option at all).


If this was merged into Rust, I assume it means the use of `unsafe` in std would be "pushed down" one level into Rustix instead.

Wonder if that means it would be easier to verify the correctness of `unsafe` use inside Rust itself?


The goal is to get this is merged into Rust's std library, they wouldn't write std on top of Rustix.

> Wonder if that means it would be easier to verify the correctness of `unsafe` use inside Rust itself?

If you mean usage of unsafe inside std (which the Rust compiler does depend upon), that is one of the explicit goals of Rustix. The usage of unsafe will be much more narrow overall, mostly surrounding the system calls themselves.


The port of libstd linked in the article still uses rustix as a separate crate.

In any case, I hope it remains separate. Third-party no_std code would benefit from it.


I am under the impression that no-std code is primarily the concern of embedded systems that might not have much of an OS [1] at all, nevermind a suite of system calls to Linux.

[1] Like a typical RTOS, that may provide some basic communication primitives, thread creation, and a hardware abstraction layer (HAL).


That is nor necessarily the case. Embedded is a big user, but there are other reasons to use no_std, such as explicitly wanting to avoid heap allocations, or wanting greater control over the native APIs being called.


Also Wasm support (can be viewed as a kind of embedded) and better portability. Anything that helps one separate the platform and more precisely define the platform boundary the better.

Libc is necessary, but not good.


Yeah, that's what I mean. Sounds good!


Possibly dumb q, but potential useful to other languages?


It looks like the goal of this project is to allow Rust code to avoid having to use FFI to call C code. In order to be useful to non-Rust languages, this project would either have to expose a C-compatible interface for FFI (which somewhat defeats the point of trying to avoid C), or else the other language would need to natively support Rust FFI (which would be a large amount of ongoing labor, since Rust doesn't have a stable ABI).


I believe that it would open the door to writing a Linux-only libc in Rust, which could be cool


There is in fact already a library that does that https://github.com/redox-os/relibc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: