I don't know much about SQLite internals, but on the face of it it sounds hacky ...

ComodoHacker · on July 6, 2023

The point is to use dumb cheap blob storage instead of CPU-consuming server someone has to manage.

antonvs · on July 6, 2023

There are many use cases where scaling a traditional centralized SQL database is problematic, that can be addressed by something like this.

We have one: we run thousands of VMs at a time, all accessing the same "database". Since we already have a good amount of horizontally-scaled compute, having to maintain a separate database cluster, or large vertically-scaled database instance, to match our peak load requirements is problematic in terms of one or more of cost, complexity, and performance. In particular, horizontally-scaled distributed databases tend not to scale up and down efficiently, because of the complexity and time involved in joining the cluster, so the cost benefits of horizontal scaling of compute are lost.

An approach like this can fit well in cases like these.

eterevsky · on July 6, 2023

I can't see how this could for a readonly database, but how would you resolve collisions on writes and don't make them super slow in the process?

antonvs · on July 12, 2023

Right, it certainly wouldn’t work for a big multi-tenant db. But the right data organization can make it work.

In our case, the data is organized such that there’s usually only one client at a time that would need to write to a given db file. Think for example of a folder per user, each containing a db file and various other files for that user. What we’re doing is actually even more granular than that - we have a folder for each object that a user could be working on at a given time.

phamilton · on July 6, 2023

Databases are really just data structures and algorithms along with some notion of durability.

Client/Server databases are just remote data structures. (E.g. Redis is short for "Remote Dictionary Server")

Sometimes you want your data structures and algorithms to run locally. Could be performance, privacy, cost, or any number of reasons.

Local, in-memory data structures hit a few bottlenecks. First, they may not fit in memory. A mechanism for keeping the dataset in larger storage (e.g. disk) and paging in the necessary bits as needed extends the range of datasets one can comfortably work with locally by quite a bit. That's standard SQLite.

A second potential bottleneck to local data structures is distribution. We carry computers in our pockets, on our watches, in our cars. Delivering large datasets to each of those locations may be impractical. Cloud based VFS allows the benefits of local data structures on the subset they need without requiring them to fetch the entire dataset. That can be a huge win if there's a specific subset they need.

It always depends on the use case, but when the case fits there are a lot of big wins here.

eterevsky · on July 6, 2023

One obvious problem that I see with this approach is that it will break if there is any change in the storage format.

With client-server architecture, the server code owns the data format, while in this storage-level remote access case, you have to ensure that all of your clients are updated simultaneously. Depending on your architecture it might or might not be feasible.

vidarh · on July 6, 2023

For the typical uses for this, you'd tend to serve up a version of Sqlite compiled to wasm or JS to the frontend, so you can be sure it's the same one. Sqlite's storage format is also unusually stable:

https://www.sqlite.org/formatchng.html

zX41ZdbW · on July 7, 2023

It's better to ensure 100% compatibility with the data format so the new server versions can read the old format without conversions. For example, in ClickHouse, you can install the latest version over a version from 7 years ago, and it will work just fine with the old MergeTree tables.

electroly · on July 6, 2023

The VFS layer in SQLite is begging to be used like this. For a read-only implementation (where you generate the SQLite databases out-of-band and upload them), the code nearly writes itself. It's maybe hacky in the sense of "I had to write a SQLite module for this" but it's a very nice fit for SQLite VFS.

noman-land · on July 6, 2023

Yes this code was very straightforward to write and this approach lends itself very well to read-only databases of large datasets with no server infrastructure overhead. It's also "free" to host for the time being.

qbasic_forever · on July 6, 2023

Depends on your needs, if it's just one or even multiple clients reading from the db then range requests of blocks is a great option. Adding a server layer is a huge leap in complexity as you now have code that has to be deployed, managed, secured, etc. vs. just simple blob storage for chunks (s3, etc.).

anon291 · on July 6, 2023

It's not hacky though. SQLite explicitly supports VFS implementations, and it's in wide use throughout industry. All this does is use emscripten + asyncify to compile to javascript and then implements a vfs that calls into regular javascript calls. SQLite is notoriously well tested, and so if it says that the VFS works a certain way, it probably does.

sitkack · on July 6, 2023

How so? It’s not like the CPU or the network care.

eterevsky · on July 6, 2023

It's difficult to implement correctly if there's more than one client. There is no single source-of-truth to manage any conflicts. Might be ok if the clients have only read-only access.