It's somewhat similar in that BigTable (and Cassandra) perform writes as sequent...

snewman · on April 12, 2011

Thanks for the links. The statistics on latency distribution are quite impressive, to say the least.

Why do you say that stratified b-trees don't need Bloom filters? Yes, the improved merging discipline reduces the number of arrays to read, but presumably there is often >1 array, which is sufficient to make Bloom filters desirable. Even if you only have two arrays, doubling the number of random I/Os per key lookup is easily enough of a penalty to make Bloom filters worthwhile. The paper itself seems to indicate that Bloom filters are used:

"In the simplest construction, we embed into each array a B-tree and maintain a Bloom ﬁlter on the set of keys in each array. A lookup for version v involves querying the Bloom ﬁlter for each array tagged with version v, then performing a B-tree walk on those arrays matching the ﬁlter."

leef · on April 12, 2011

The paper doesn't say so I am making some assumptions here but if you had a bloom filter per array then as these doubling arrays get really big all the bloom filter would tell you is that the target entry is probably contained in this giant array. A false positive in the bloom filter would cause a pretty significant amount of work.

The stratified b-trees use forward pointers to help target the search in the next array down the tree. Like regular b-trees the smaller root arrays will likely be cached in memory so the the number of random I/O's will be small.