Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Modern freshest gcc 11 can optimize the if's nicely https://godbolt.org/z/771foExcG

  getCountry:
        mov     eax, OFFSET FLAT:.LC0
        cmp     edi, 258
        ja      .L1
        mov     edi, edi
        mov     rax, QWORD PTR CSWTCH.1[0+rdi*8]
  .L1:
        ret


Anyone know what the purpose of the mov edi, edi instruction there is?

Edited to add: I understand that it's a NOP, but why would the compiler emit one here?


That instruction clears upper 4 bytes of the rdi register. Note the next instruction uses rdi in the address.

edi register is the lower 4 bytes of rdi. Instructions which write these smaller pieces zero out the unused higher bytes of the destination registers. This helps with performance because eliminates data dependencies on the old values in these higher bytes.


This is not a NOP; it explicitly clears the upper 32 bits of EDI since the compiler does not know that they are zero in this situation. If you change cc from an int to size_t (long on x86-64) the compiler will generate:

        mov     eax, OFFSET FLAT:.LC0
        cmp     rdi, 258
        ja      .L1
        mov     rax, QWORD PTR CSWTCH.1[0+rdi*8]
Note that in some cases the compiler can do this automatically via lifetime analysis but not in this freestanding example.


It does nothing, so its a noop of sorts. I wonder if its a branch-delay tactic of some kind? Surely the algorithm is unchanged if it were removed.


It’s just a two byte NOP. IIRC, it’s Intel’s recommended form for one of that length. Windows uses it for hot patching,[0], but I can’t imagine that’s the reason here.

[0]: https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=95...


clang does the same - I tested it back to clang 7.0.

So clang did this for a long time it seems.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: