I start with unsigned char to begin with (well `uint8_t` to be precise, which has the advantage of not compiling at all if you happen to use a DSP that uses 32-bit chars). Then I convert those chars to unsigned 32-bit integers. Only then do I shift them. There is no need to mask anything here.
Note that modern compilers translate this whole thing into a single unaligned load operation. Even better, I've noticed that using a macro instead of a function tends to make performance worse with modern compilers.
Note that modern compilers translate this whole thing into a single unaligned load operation. Even better, I've noticed that using a macro instead of a function tends to make performance worse with modern compilers.