Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can someone summarize the tradeoffs between radix trees and (various kinds of) B-tree? (I.e., LSM-trees, B-epsilon trees, classic B(+)-trees.)

For something like this application, which seems heavily read-biased, it seems like a classic B-tree would be great. Or even a bloom filter, although the tradeoff there for false positives is maybe not suitable.



Most search trees only need a total ordering on the elements. A radix tree has stricter requirements: the data can be split in parts (e.g. x in x[0]..x[n] and y in y[0]..y[n]) such that each part has a total order and the original order is the prefix order of the parts.

Now you can already save a small constant by putting at each radix level a B-tree again. Then you won't compare x[0] to y[0] again at level (1...n) when you know they are equal (for example a database of all sentences starting with 'The')

True radix trees go further: when every part is finite with a small continuous domain (say 256 elements) you can build a jump table (1 index instead of 8 comparisons and 1 index). Additionally: no balancing needed, less memory needed than a B-tree.

Problems with radix trees: normally you don't store the elements, only rebuild them so no referencing. Extra constraints on your datatype result in higher coupling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: