> You can't use individual functions without knowing the data structures they op...

karpierz · on Sept 22, 2024

The question on my mind: how do you figure out what the functions do without reverse engineering?

If I were to guess, you're saying that you reverse engineer the API boundary without reverse engineering the implementation. But then figuring out what the API contact is without documentation seems intractable for most API boundaries.

boricj · on Sept 22, 2024

For context, my tooling is a Ghidra extension, so there's all the usual Ghidra stuff that applies here.

Indeed, it depends what the API boundary is for the selection to be exported.

If it's the whole program without some well-known libraries (like the C runtime library for a statically linked executable or the Psy-Q SDK for a PlayStation video game), then the API boundary is trivial in the sense that it's a documented one. The hard part is actually figuring where those libraries are so that you can cut them out of your selection while exporting.

If it's a particular subset internal to the program then it's trickier because you don't have that (but if you know you want to export it, then you must already know something about it). Traditional reverse-engineering techniques do apply here, it's just that you only care about the ABI boundary instead of the whole implementation, so it's usually less work to figure out.

However, if you get it wrong then the toolchain will usually not detect it and you'll have some very exotic undefined behavior on your hands when you try to run your Frankensteinized program. Troubleshooting these issues can be very tricky because my tooling doesn't generate debugging symbols, so the debugging experience is atrocious.

I've always managed to muddle through so far, but one really nasty case did take me a couple of weekends to track down (don't cut across a variable when you're exporting because you'll truncate it, which can lead among other things to corrupting whatever was laid out next in memory at runtime when the original delinked code tries to write to it).