I'm not seeing any benchmarks: since they tout speed as a major selling point, that surprises me.
Edit: For instance, the blog post mentions being inspirsed by MurmurHash, which touts its performance on its site along with benchmarks (http://sites.google.com/site/murmurhash/):
Any function that uses both of these violates strict aliasing rules and might miscompile on recent gcc versions. Only one of them isn't enough, because char* is allowed to alias anything in C/C++. But if you use both, then you have a uint32_t* and a uint64_t* pointing to the same memory, violating the language spec.
ffmpeg has a header of macros to avoid this problem. They have names like AV_RN32A (aligned, native-endian 32-bit read), AV_RL32 (unaligned, little-endian 32-bit read) and so forth.
> Any function that uses both of these violates strict aliasing rules
More specifically, any function that uses both of these to access the same data violates strict aliasing rules. But this would imply that the data is being loaded redundantly, which seems unlikely in an implementation where speed is a top priority.
For example, I do not believe the following function violates strict aliasing rules:
I am surprised at how half-baked this code sounds. Why would Google release a new hash library that does not support big endian platforms, uses unaligned memory access on little endian platforms, is strict-aliasing unsafe, and is implemented in C++?
For what it's worth, Snappy (the compression library Google released a couple weeks ago) has many of these same limitations. I wanted to port both of these to the RISCy, big-endian architecture I work on as part of my research, so finding out how unportable they were was kind of a bummer for me, but honestly it's a pretty reasonable tradeoff for Google; x86 is the name of the game in cheap (sorry Power 7/SPARC) commodity servers. If it were me, I might have put a bit of thought/work into portability before open-sourcing it, but I'd rather the code be out there than not, despite its limitations.
what does this do:
#define LIKELY(x) (__builtin_expect(!!(x), 1))
oh, to answer myself, is to help predict the branch direction - it indicates which is likely the common case in the test. http://kerneltrap.org/node/4705
A lot of instruction sets (including x86) let you encode branch prediction hints into the actual instructions to help out the branch predictors. But for x86 at least these hints didn't turn out to be too useful, so they're ignored (except on P4, I believe).
Edit: For instance, the blog post mentions being inspirsed by MurmurHash, which touts its performance on its site along with benchmarks (http://sites.google.com/site/murmurhash/):
In fact, it looks like the author of MurmurHash also developed a test suite for hash functions which includes performance testing: http://code.google.com/p/smhasher/wiki/SMHasher