Here's how I implement little endian parsing: static uint32_t load32_le(const ui...

Here's how I implement little endian parsing:

  static uint32_t load32_le(const uint8_t s[4])
  {
      return (uint32_t)s[0]
          | ((uint32_t)s[1] <<  8)
          | ((uint32_t)s[2] << 16)
          | ((uint32_t)s[3] << 24);
  }

I start with unsigned char to begin with (well `uint8_t` to be precise, which has the advantage of not compiling at all if you happen to use a DSP that uses 32-bit chars). Then I convert those chars to unsigned 32-bit integers. Only then do I shift them. There is no need to mask anything here.

Note that modern compilers translate this whole thing into a single unaligned load operation. Even better, I've noticed that using a macro instead of a function tends to make performance worse with modern compilers.