In mysql parlance, I think it is pretty common to run a replicated server where ...

lichtenberger · on March 10, 2022

This is precisely what https://github.com/sirixdb/sirix does. A resource in a database is stored in a huge persistent structure of index pages.

The main index is a trie, which indexes revision numbers. The leaf nodes of this trie are "RevisionRootPages". Under each RevisionRootPage another trie indexes the main data. Data is addressed through dense unique and stable 64bit int nodeKeys. Furthermore, the user-defined secondary indexes currently are also stored as further tries under a RevisionRootPage.

The last layer of inner pages in a trie adds references to a predefined maximum number of data page fragments. The copy-on-write architecture does not simply copy whole data pages, but it depends on the versioning algorithm. The default is a sliding snapshot algorithm, which copies changed/inserted/deleted nodes plus nodes, which fall out of a predefined window (usually the size is low, as the page fragments have to be read from random locations in parallel to reconstruct a full page). This reduces the amount of data to store for each new revision. The inner pages of the trie (as well as the data pages) are not page-aligned, thus they might be small. Furthermore, they are compressed before writing to persistent storage.

Currently, it offers a single read-write transaction on a resource plus read-only transactions without any locks.

lazide · on March 9, 2022

Those logs though are at the lower Db level of abstraction (using them requires getting out of the database proper).

What the parent is referring to is the actual database schema itself. It’s a giant versioned log essentially.

Presumably there is some sort of ‘key frame’ however, or you’d need to go all the way back however many years ago to start rebuilding the objects current state?

bob1029 · on March 9, 2022

> Presumably there is some sort of ‘key frame’ however, or you’d need to go all the way back however many years ago to start rebuilding the objects current state?

Correct. My criteria for the snapshot feature becoming a priority will be when log recovery takes longer than 5 minutes. I cannot see that occurring anytime soon based on current figures.

throwaway81523 · on March 10, 2022

That event architecture also sounds like it has to be serialized, right? But, the idea of database is concurrent operations. Is there a conflict?

bob1029 · on March 10, 2022

Everything in my solution is serialized through a LMAX Disruptor ringbuffer to ensure single writer and sequential processing of events.