Introduction to BTRFS and ZFS – Linux

Last week I had a little chat with a very good friend of mine. We were talking about how to best configure his new home server. One of the topics was the choice of filesystems which resulted in a longer discussion with some other friends (all long term IT people and Linux enthusiasts).

So I decided to write down some parts of this discussion, for those of you that may have the same decision to make.

First of all let’s say that the usage scenario discussed had its focus on safe data storage and ways to secure your data, so it soon focused on the filesystems BTRFS and ZFS.

Why BTRFS or ZFS?

First obvious question is: why those two filesystems? Well basically one key feature was snapshot capability and that really limits the amount of (stable) Linux filesystems to those. There are several other filesystems that either directly support snapshots or solutions that use other underlying features like logical volume managers (LVM) to compensate for that lack of that feature in filesystems themselves. However they either lack in terms of stability or are quite inflexible or hard to use due to the lacking integration.

That leaves BTRFS and ZFS to take a closer look at.

Licensing

If you’ve read a little about those filesystems you’ll have seen quite some discussion about a licensing problem with ZFS. As ZFS was formerly developed by Sun (now Oracle) they chose the Common Development and Distribution License (CDDL) which is consider to be incompatible with the GNU General Pulic License (GPL) of the Linux kernel.

This license prevents the inclusion of ZFS into the Linux kernel and that’s why in order to use ZFS on Linux you need to build ZFS as external kernel module (Debian/Ubuntu packages are available as dkms packages which reduces the manual work to a minimum).

That said I’ll focus on technical details from now on and leave the ideological discussion about licenses aside.

Technical aspects and features

Snapshots

As mentioned above one of the key features of both filesystems are snapshots. So let’s first talk about what snapshots are, how they are implemented and how they can be used (no worries, we’ll stick with the very basics to keep it short).

So most filesystems implement snapshots on top of their Copy On Write (COW) semantics. That means once a block (think of that as a part of a file) is changed, it is not written in its former place, but a new free block is allocated and the data is written there. After that only the references to the blocks containing the file’s contents needs to be updated (this usually requires only very little extra space).

A snapshot then is little more than “freezing” a version of all the references. All references stored in this snapshot point to the file content (and metadata) of a certain point in time.

The big advantage of snapshots (in comparison to a copy/backup) is the reduced space requirement: Basically if you didn’t change anything between snapshots a new snapshot will require (almost) no extra space.

Compression

Both ZFS and BTRFS support built-in compression support. This may or may not be of interest for you depending on the data you store on your system.

Always keep in mind, that such features come at an extra cost (compression takes extra CPU time, even if the selected compression algorithms are optimized to make that overhead as little as possible).

On the other hand nowadays most data formats save data in a more or less optimized way, so compression on filesystem level may yield little to no advantage.

For example if you’re only storing multimedia data like movies or audio I wouldn’t expect relevant savings.

If you’re dealing with large text or XML files you might however profit from compression support.

To check out the potential, just take a representative directory of your data and try to store it in a ZIP (or whatever compression format you prefer) file. If the resulting file is significantly smaller than all the files contained in this directory you should consider enabling compression.

Deduplication

Deduplication is quite another beast: It’s usually done on block level and it tries to find blocks with the same content. If such a block is found it’s not saved a second time, but only a new reference to it is stored (which takes much less space, s. COW above).

This sounds nice, but comes at (much) higher cost than compression: In order to compare all existing blocks against the new one, at least a checksum of all those blocks needs to be available. And the access to this information needs to be fast in order to keep the penalty acceptable. So those lookup tables should best be located in memory (or at least on a fast SSD) – otherwise the performance may degrade to an unacceptable level.

There are detailed descriptions about how much memory is recommended, but for here let’s stick with two basic assumptions: “more is better” and “bigger storage requires bigger lookup tables”.

In the upcoming ZFS 2.3.0 release there’ll be an opportunistic dedup option, that will use a lookup table as long as memory (or settings) permit and will otherwise ignore the deduplication check. As a result some potential deduping will be skipped (with a potential ratio of “available RAM” to “recommended RAM” ).

Encryption

Built-in encryption is currently only available for ZFS.

Checksums

Both BTRFS and ZFS use checksums for all stored data (and metadata). This becomes more and more relevant as the amount of data stored increases rapidly. And with this amount of data the likeliness of bit-rot increases.

So what’s that bit-rot anyway? As a bit is the smallest amount of data it is also the smallest part of data that can be changed. Bit-rot is just that, happening for some unspecified reason: Just a bit of our data changing.

On most classical filesystems you normally wouldn’t even notice this change, or if you do so it might be to late (because the file with the changed bit may no longer be usable, or – in some situations even worse – may contain false data).

On a filesystem with checksum support those changes can at least be detected (so you know that you should no longer rely on the data contained) or even corrected (like ECC-RAM can correct single bit errors).

Logical Volume Manager

If you’ve been around the Linux community for some time you will have heard the term Logical Volume Manager (LVM). Basically an LVM allows a more flexible way of partitioning your disks (or other block devices).

With classical partitioning you can only create or delete partitions (and the data within such a partition will be lost, once you delete it). LVM allows to move and resize its Logical Volumes (LV, similar to partitions). It can also be used to use multiple physical drives as one logical (bigger) one.

However a classical LVM does only handle block devices and a filesystem needs to be created on top of it.

BTRFS and ZFS both integrate most of the LVM features into their filesystem, simplifying many tasks.