I have a technical question that I'm not at all poised to answer, that might be stupid like all questions not in one's domain:
I recently discovered the joy that is ZFS and everything that comes with it. I understand that the technical underpinnings of git are actually extremely different (and mathematical) _but_ just how far is a ZFS snapshot from a git commit really? It seems like the gap between the two might not need a huge bridge. Could a copy-on-write filesystem benefit from more metadata that would come from being implemented in a more git-like way?
Conceptually, the two things are very much related and a birds-eye view shows a lot of similarities. But when you get into the weeds, there are some significant differences. git is optimized to store a great many historic states of files with minor differences between consecutive ones, and it assumes that these are essentially static, immutable snapshots. A COW file system that allows for snapshots is optimized more for allowing mutation of these snapshots (i.e. updating files one way in one snapshot and another way in another one). This, combined with the additional housekeeping required for a file system (disk block allocation, etc. - the actual core features) makes the implementations of the two things very different.
Very close. Actually companies such as https://postgres.ai/ use ZFS storage to provide git-like features on top of Postgres: Using copy-on-write on the underlying ZFS, you can "fork" a new branch of your DB with all the data, instantly. Then both branches can live their lives independently.
But I don't think ZFS has the equivalent of git merge though.
I trust ZFS for providing RAID-like features via an HBA, just as much as using ZFS as a volume manager on top of a hardware RAID. My experience with BTRFS for RAID was disastrous.
I really like the ability to use zfs send/receive over ssh for offsite backups.
I'll admit, I haven't kept up with BTRFS features after abandoning it, so some of the features may have improved.
ZFS has been in production use for 20 years, and has seen a lot of improvements since then, but it is well tested and well understood.
BTRFS has never really achieved "production" status in most people's eyes (at least, that I've seen), and RedHat removed support for it completely (not that they support ZFS either).
ZFS is also very straightforward once you learn how it works, and that takes very little time. The commands and the documentation (man pages) are thorough and detailed. Conversely, trying to figure out how and why BTRFS does what it does has been a huge challenge for me, since nothing seems to be as straightforward as it could be. I'm not sure why that is.
Development on BTRFS is ongoing, but it is starting to feel as though it's never going to actually finish its core features, let alone add quality of life improvements. As an example of what I mean: I run a Gitlab server, which divides its data into tons of directories, some of which are huge and some of which are not, but many of which have different use cases; a Postgres database, large binary files, small text files, temporary files, etc. With ZFS, I set up my storage pool something like this:
gitlab/postgres
gitlab/git-data
gitlab/shared/lfs-data
gitlab/backups
Now everything is divided up and I can specify different caching and block sizes on each one depending on the workload. When I'm going to do an upgrade I can do an atomic recursive snapshot on gitlab/ and I get snapshots on everything.
BTRFS, as far as I can tell, doesn't let you change as many fine-grained things per-storage-space, and it doesn't have atomic recursive snapshots (and touts this as a feature). I'm not sure if it supports a similar feature to zvols, where you can create a block device using the ZFS pool storage (in case you need an ext4 file system but you want to be able to snapshot it, or similar).
<anecdote> I have never once had a single issue with ZFS and data quality, with the exception of cases where underlying storage has failed. Meanwhile, I've had BTRFS lose data every single time I've tried to use it, often within days. Obviously lots of other people haven't had that issue, but suffice to say that personally, I don't trust it. </anecdote>
Meanwhile...
ZFS doesn't support reflinks like XFS and BTRFS do, so you can't do `cp -R --reflink=always somedir/ otherdir/` and just get a reflink copy (i.e. copy-on-write per-file). On XFS, and presumably BTRFS, I can do this and get a "copy" of a 30-50 GB git repository in less than a second, which takes up no extra space until I start modifying files. On ZFS, I have to do `cp -R somedir/ otherdir/` and it copies each file individually, reading the data from each file and then writing the data to the copy of the file.
ZFS also doesn't come as part of the kernel, so you can run into issues where you upgrade the kernel but for whatever reason ZFS doesn't upgrade (maybe the installed version of ZFS doesn't build against your new kernel version) and then you reboot and your ZFS pool is gone.
You also "can't" boot from ZFS, which is to say you can but if you do something like upgrade the ZFS kernel modules and then update your ZFS pool with features that Grub doesn't understand, you now cannot boot the system until you fix it by booting into a rescue image and updating the system with a new version of Grub. Ask me how I found that out.
In the end, my experience has been that ZFS is polished, streamlined, works great out of the box with no tuning necessary, and is extremely flexible as far as doing whatever it is I want to do with it. I see no real reason not to use ZFS, honestly, except for the "hassle" of installing/updating it yourself, and there's an Ubuntu PPA from jonathanf which provides very up-to-date ZFS packages so that you can get access to bug fixes or new features (filesystem features or tooling features) very quickly, with zero effort on your part.
Is it correct that then the original DB and the snapshotted DB share those blocks on the file system which are unmodified?
Assume 1 row per block:
Original DB "A" has 2 rows, a snapshot "B" is created, "B" deletes one row and adds a new one.
Is it true that the row which "B" took over from "A" and left unmodified resides on the same block for "A" and "B", so that if the block gets corrupted, both databases will have to deal with that corrupt row?
Yes, that's one of the core parts of copy-on-write.
It shouldn't matter if you have a reasonable setup. If you depend on other files on the drive to continue working after blocks have started to go corrupt, that's not a good system.
The IBM/Rational Clearcase version control system is an example of building a VCS on top of a versioning file system (MVFS), though MVFS uses an underlying database instead of a copy-on-write snapshot mechanism. https://www.ibm.com/support/pages/about-multiversion-file-sy...
It would be nice if ZFS snapshots were more flexible. And you could say "like git" when talking about the user experience. But it would not be like git in terms of implementation. Git's implementation is not really copy-on-write. It's deduplication.
I'd say the git method is actually pretty low in metadata, and the way you'd improve ZFS snapshots doesn't involve making them more like git.
If you did get that huge amount of work done, you could then approximate git with snapshots alone. Right now, you'd probably want snapshots and dedup to work together to approximate git using ZFS.
doesn't the zfs diff command mostly cover it? You have snapshots, diffs, and clones which is basically equivalent to commits, diffs, and branches. You're missing commit logs and that's it, right?
If you introduce a new file onto multiple ZFS 'branches', there is no way to have it stored copy-on-write.
But copying a file from one 'branch' to another is the only way to emulate a cherrypick or merge.
So after a while of active use with many branches, you're going to have a lot of redundant copies of files all over. You're no longer making proper use of snapshots, and it becomes less efficient than having a working directory and a directory full of commits that hard link to each other.
At Google, people have built both, and we use a version control system on top of a snapshoting filesytem. The snapshotting is for never losing code/state on your machines, and the version control system is for interfacing with others (code review, merging, etc). While you could use one system for both, having both layered on top makes it easier to change them to each specific workflow.
I recently discovered the joy that is ZFS and everything that comes with it. I understand that the technical underpinnings of git are actually extremely different (and mathematical) _but_ just how far is a ZFS snapshot from a git commit really? It seems like the gap between the two might not need a huge bridge. Could a copy-on-write filesystem benefit from more metadata that would come from being implemented in a more git-like way?