As someone who has worked far more with dynamic linking in Windows the ELF system has always seemed needlessly complex; the extra indirections of the GOT/PLT mechanism and PIC are avoided by simply linking DLLs with different load addresses (so they don't always need to be relocated, but can be if necessary), and the only thing that needs to be associated with a symbol is an address. Makes it especially convenient when writing things like in-memory executable compression and shared data between processes. The ability to "import X from Y" easily is also nice.
Anyone with experience from the other side (working with ELFs and then moving into Windows) want to share their views?
I've worked extensively with both ELF and PE systems --- I was on the Windows Perf team and the Windows Phone core team, and now I do low-level Android goo at Facebook.
I vastly prefer the Windows shared library model. In addition to the advantages you mention, the Windows per-DLL symbol namesystem system is much better than ELF's hazardous model: in ELF, accidental interposition is a big hazard, so you have to very carefully namespace the symbols exported from a shared object. In Windows (and in OS X), symbol name collisions are simply not a problem: there's no global namespace in which symbols can collide. Yes, you still have DLL _name_ collisions, but SxS addresses that problem nicely. As a result, hosting unrelated bits of code in the same process is very common in the Windows world and uncommon in the ELF world. RTLD_LOCAL and RTLD_DEEPBIND are completely unnecessary.
Another advantage Windows has in practice is default symbol visibility. Windows DLLs export only the symbols you explicitly instruct your compiler and linker to export --- through export files or compiler annotations. The default in ELF systems is to export everything that's not file-static. This configuration is particularly fun when combined with the namespace problem. While Unixish compilers can be configured to work like Windows and export only needed symbols, I've found that very few people do. These people then go on to wonder why shared libraries are slow and the binaries so large. (-Bsymbolic helps, of course.)
If I were benevolent POSIX dictator for life, one of my edicts (although not my first one) would be to require an OS-X-style two-level namespace and hidden symbol visibility by default. Yes, LD_PRELOAD interposition gets harder. Just deal with it and modify functions directly.
The ELF dynamic linking mechanism is designed to emulate static linking. That's like designing cars to neigh and occasionally kick people to death with robot legs that exist only for this purpose.
Also, it's a minor thing, but LoadLibrary in Windows returns a pointer to the PE header. dlopen is nowhere near that simple, nor is the in-memory representation of a shared object as useful. (It'd also be nice if dladdr1 got some documentation. Also, it'd be nice if Bionic weren't even more awful than glibc in this respect.)
> The default in ELF systems is to export everything that's not file-static.
...and then import them all again, even when they're in the same file, which I think is one of the most bizarre aspects of the ELF mechanism - I can certainly see that it allows the extra flexibility of overriding functions, but I've never thought "I'd like to be able to easily replace any function in my application with one from a library". I don't imagine it's a common use case, since on Windows it would be the equivalent of having a PE import itself (is this even possible?)
> The ELF dynamic linking mechanism is designed to emulate static linking. That's like designing cars to neigh and occasionally kick people to death with robot legs that exist only for this purpose.
So I'm only passingly familiar with the state of OS's back before I was born, but between "your code works unchanged between static and dynamic linking, you just have to change your build system which is already OS dependent" and "how you export and import symbols in the source code varies depending on how you're linking and also which OS you're building on and which compiler you're using", the former seems less insane.
Sure, but static linking isn't dynamic linking. Pretending that it is has done much harm, but I don't think it's led to demonstrable benefits. If you want to split a single module into two dynamically-linked parts, you need to define the interface between these parts anyway for versioning purposes. If you're doing that work anyway, it's not hard to add export tags at the same time.
You're right, though, about history being a factor. Windows was born with dynamic linking; shared libraries were a Unix bolt-on. (Then again, symlink was a bolt-on feature too, but it's well-integrated these days.)
It's also interesting to note that on Windows, there's no such thing as a static executable in the sense you might have one in Unix. Every system call must go through ntdll.dll or it'll stop working on the next major upgrade, which will scramble the system call numbers. On Windows, the ABI compatibility boundary is ntdll/kernel32/user32/etc., while on Unix, the ABI boundary is the kernel-userspace boundary.
The Windows way of doing it is much better. It places fewer constraints on the kernel and lets you implement 32-bit-to-64-bit system call thunks entirely in userspace, completely avoiding a major class of security vulnerability.
> Every system call must go through ntdll.dll or it'll stop working on the next major upgrade, which will scramble the system call numbers.
Wow, really? So you can't make system calls from assembly language on Windows?
> It places fewer constraints on the kernel and lets you implement 32-bit-to-64-bit system call thunks entirely in userspace, completely avoiding a major class of security vulnerability.
Well, there is VDSO. What kind of thunks and security vulnerabilities are you talking about though?
I don't have much experience with the Windows model, but if I'm understanding you correctly . . .
ELF does support load-time relocation, as appears to be done under Windows if two base addresses conflict. That just isn't the standard way of doing things, for several reasons.
First, load-time relocation imposes a cost at startup (whereas PIC imposes a smaller cost throughout the lifetime of the process). I'm given to understand that, back in the day, the startup delay for programs using large shared libraries could be quite noticeable.
Second, load-time relocation reduces in-memory text section sharing in the case where relocation is performed. If your goal in using shared libraries is saving RAM, this is a problem.
Third, it's a matter of inertia. Originally, UNIX shared libraries (a.out format) were built with a static, non-relocatable base address. Library authors had to coordinate via a central authority to ensure compatibility. PIC seemed like a good way to get as far away from that problem as possible - or so I understand.
Indirection through the GOT/PLT also serves another purpose, even in load-time relocation code - it enables replacing a symbol in one shared library with a symbol from another (eg via LD_PRELOAD). Though that's more of a side benefit than a justification.
Keep in mind that you don't need text relocations for code sequences that are naturally PC-relative, like jumps on x86, or pretty much anything on ARM and amd64. Windows DLLs don't need text relocations for references to other modules: the IAT (Import Address Table) does pretty much the same thing as the GOT and in pretty much the same way.
Um, ARM by default is very not PIC; loads have only a 4k offset so references to the data section are stored as absolute addresses in the text section, unless you explicitly specify PIC which adds another 'add rN, pc' instruction to each reference. (and actually I believe Windows doesn't even support these PIC references in ARM code)
> Though I believe that a degree of indirection is preserved on AMD64 to enable symbol interposition?
You really don't want symbol interdiction. You don't use it most of the time, and the rest of the time, you're just changing a call that looks like:
foo(1, 2);
to
(*g_foo)(1, 2);
Which do you think is faster? There's a reason everything on Android compiles with -Bsymbolic (which kills interdiction for calls between functions in the same module). You really should be compiling all your code with -Bsymbolic -fvisibility=hidden; explicitly export the symbols you want other modules to call.
The first is faster, of course. Whether or not I want the second depends entirely on what I'm doing :) It certainly shouldn't be the default - it isn't very useful during normal work - but I've been glad to have it before.
I wouldn't recommend -Bsymbolic by default unless you know it's safe for your environment, though. There is software that uses symbol interposition to 'productive' ends in production (not much of it, thank heavens). Mobile platforms are something of a special case.
C++ exceptions exempted (see -Bsymbolic-functions), anything broken with -Bsymbolic is inherently broken and ought to be fixed. The same goes for symbol interposition. There are ways of providing extension points that are saner than allowing any shared library that happens to be loaded into your process the ability to override one of your functions.
Ought to be fixed doesn't mean will be fixed, unfortunately. Especially when the people doing the fixing wouldn't be the developers. Fortunately there aren't too many applications that rely on interposition to function.
As for myself: I've only ever used symbol interposition for debugging, instrumentation, etc . . . for which it was quite useful (as I've said). I pay attention to what my libraries export, so accidental interposition has never been a problem for me. (Making that easier by default is something that I would support.) I'll happily discuss the matter further, but I'm not interested in arguing it.
"back in the day, the startup delay for programs using
large shared libraries could be quite noticeable.
The start up delay for programs caused by relocation of shared libraries is a problem big enough today that the Google Chrome team jumps through several hoops to mitigate it. They collect DLL startup addresses from the systems where Chrome is installed and calculate an optimal address for chrome.dll so that the likelihood of relocation is minimized.
Or maybe I dreamt all of this, because I couldn't find the original article where I read it.
One disadvantage of the Windows system is, as far as I understand it, that, once relocation happens the libraries can't be shared in memory anymore. This defeats the purpose of a shared object somewhat.
After the loader patched in the new addresses the pages making up the library contain different data for multiple copies of your library, and differ from the data in the PE file. So the PE file can't simply be mmaped and the same pages can't be used by different processes.
I'm not saying this is a big issue, but I think it's a difference worth mentioning.
I just happen to know that at work where I do VB6, the loaders are inefficient if it is to probe for a load address, so it is best to give one that doesn't collide with other dll/ocx's that are loaded.
Anyone with experience from the other side (working with ELFs and then moving into Windows) want to share their views?