That was dumb CPU design. You can use SYSCALL/SYSRET in a way which creates a protection fault in hypervisor mode, but with a stack pointer set by the caller. The sequence of checks in SYSRET on Intel processors is badly chosen, because the AMD spec is ambiguous.
One obvious question: Is there a way to fix the problem compatibly?
AMD doesn't seem to be vulnerable, so you'd imagine that code couldn't rely on this behavior because then it would fail on AMD chips, but... well, once you've read enough of The Old New Thing, you get a feel for how gonzo some developers get with This Never Fails and This Will Never Change, and if they have to write different codepaths for Intel and AMD instead of doing things sanely, that doesn't seem such a huge imposition to the irrational mind.
There is, and it's been fixed in Linux for a few years, possibly the severity of the issue wasn't even realized back then.
In linux-mainline/arch/x86/entry/entry_64.S, at syscall_return you'll find a comment explaining that a more complicated and slower method will used to return to usespace should a few conditions not be met. https://github.com/torvalds/linux/blob/master/arch/x86/entry...
[PATCH] x86_64: When user could have changed RIP always force IRET
Intel EM64T CPUs handle uncanonical return addresses differently
from AMD CPUs.
The exception is reported in the SYSRET, not the next instruction.
This leads to the kernel exception handler running on the user stack
with the wrong GS because the kernel didn't expect exceptions
on this instruction.
This version of the patch has the teething problems that plagued an earlier
version fixed.
This is CVE-2006-0744
Thanks to Ernie Petrides and Asit B. Mallick for analysis and initial
patches.
Well atleast they documented it. Intel does mention the conditions under which a #GP(0) will result in their instruction set manual. I'd say Microsoft is partly to blame here.
Quick question: why do you think AMD decided canonical addresses had to be sign-extended rather than just lead with zeroes? Wouldn't an access to 0xffff800000000000 also break if the address space was extended?
More "compatible" than easier really (I mean, two subtractions is hardly a heavyweight process). The trick was already in long use in shipping OSes and they didn't want to break anything.
So, inevitably, there's a Windows exploit.