Btrfs

Introduction to Btrfs Filesystem

Btrfs (B-Tree Filesystem) is a modern copy-on-write (CoW) filesystem for Linux. Btrfs aims to implement many advanced filesystem features while focusing on fault tolerance, repair, and easy administration. The btrfs filesystem is designed to support the requirement of high performance and large storage servers. It is suitable for petabyte-scale data centers as well as cellular smartphones.In this article, I am going to discuss the Btrfs filesystem and its features. So, let’s get started.

Copy on Write – CoW Filesystem:

Btrfs is a copy-on-write (CoW) filesystem. In a CoW filesystem, when you try to modify data on the filesystem, the filesystem copies the data, modifies the data, and then writes the modified data back to a different free location of the filesystem.

The main advantage of the Copy-on-Write (CoW) filesystem is that the data extent it wants to modify is copied to a different location, modified, and stored in a different extent of the filesystem. The original data extent is not modified. So, the btrfs filesystem can eliminate the risk of data corruption or partial update in case of power failure during data modification as the original data is kept unchanged.

The main disadvantage of the Copy-on-Write (CoW) filesystem is that big files tend to get fragmented as they are modified. So, defragmentation is required once in a while. Luckily, the btrfs filesystem supports online defragmentation. So, you don’t have to unmount the filesystem to defragment a btrfs filesystem.

Main Features of Btrfs Filesystem:

The main features of the Btrfs filesystem are:

i) Extent based file storage: In an extent based filesystem, the storage unit is called an extent. An extent is a contiguous area of storage that is reserved for a file. One file requires one extent, no matter how small the file is. For larger files (file size larger than the extent size), multiple extents will be required. For larger files, metadata will be used to keep track of the extents the file are using. In the Btrfs filesystem, the metadata is significantly smaller in size. Smaller metadata improves storage efficiency and the performance of the filesystem.

ii) Huge file size support: In a Btrfs filesystem, a single file can be about 264 bytes or 16 EiB (exbibytes) in size. No matter how big your file becomes, Btrfs can support it.

iii) Space-efficient packaging of small files: Normally, no matter how small a file is, it will require one block or one extent to store the file. This wastes a lot of disk space. To solve this problem, the Btrfs filesystem embeds smaller files in the metadata to store smaller files efficiently.

iv) Space-efficient indexed directories: The btrfs filesystem directories are indexed in two different ways. For filename lookup, key-based indexing is used. To reference data, inode-based key indexing is used. Two-level indexing improves directory/file lookup performance and reduces storage requirements for the indexes.

v) Dynamic inode allocation: You need 1 inode to reference 1 file. Many filesystems (i.e., Ext4) have a fixed number of inodes. So, if you create too many small files, you may have a lot of space left on your disk, but you won’t be able to create any new files. You also can’t increase the maximum number of inodes once the filesystem is created.

Btrfs solves this problem by allocating inodes dynamically as they are required. So, you can create as many files as you want as long as you have free disk space.

vi) Writable snapshots and read-only snapshots: The Btrfs filesystem supports snapshots. You can take a snapshot of the current filesystem, which you can use to restore your data if you have accidentally removed some files or corrupted some data.

By default, the btrfs snapshots are read-only. Once you’ve taken a read-only snapshot, you can’t change any files/directories in that snapshot. In any case, if you want to change any files/directories after you have taken a snapshot of your existing Btrfs filesystem, you can change the read-only snapshot to a writable snapshot and modify any files/directories in that snapshot.

vii) Subvolumes: A Btrfs filesystem can have many subvolumes. A subvolume is a named binary tree (B-tree) (or internal/logical filesystem root) of the existing filesystem root tree (main) of the btrfs filesystem. A subvolume is not a block device of its own. But, you can mount Btrfs subvolumes individually. You can think of subvolumes as namespaces.

viii) Subvolume aware quota support: You can allocate quotas for subvolumes as well. Once the quota is exceeded, you won’t be able to add any new data to the subvolume. You won’t need any separate programs to create Btrfs subvolume quotas.

ix) Checksums on data and metadata: To avoid data corruption, Btrfs uses crc32c checksum algorithms for the data and the filesystem’s metadata by default. The checksums are stored in the filesystem to automatically check for filesystem errors and data corruptions in the background.

Btrfs has support for many other checksum algorithms: xxhash, sha256, and blake2b.

x) Compression: Btrfs filesystem supports transparent file compression. The compression and decompression of the files in a btrfs filesystem are done in the background automatically.

Btrfs supports 3 compression algorithms: ZLIB, LZO, and ZSTD.

ZLIB is the default compression method of the btrfs filesystem.

xi) Integrated multiple device support: Btrfs filesystems have built-in logical volume manager (LVM) support. You can add multiple storage devices in a single btrfs filesystem. You can also configure RAID arrays on the btrfs filesystem without needing any extra piece of software.

Btrfs filesystem supports data striping, data mirroring, data striping+mirroring, and single and dual parity implementations.

Data striping: If you have added multiple storage devices in the same btrfs filesystem, btrfs can store the same file on different physical devices/partitions. This is called data striping. Data striping improves the read/write performance of the filesystem. RAID-0 uses the data striping feature extensively.

Data mirroring: If you have added multiple storage devices in the same btrfs filesystem, all the data written to one storage device will be written to all the other storage devices. This is called data mirroring. RAID-1 uses the data mirroring feature extensively.

Data striping+single parity: RAID-5 uses data striping and single distributed parity. If you have added multiple storage devices in a btrfs filesystem, then RAID-5 will strip the data on multiple storage devices and calculate and store parity blocks across the storage devices. RAID-5 can sustain a single drive failure.

Data striping+double parity: RAID-6 uses data striping and double distributed parity. If you have added multiple storage devices in a btrfs filesystem, then RAID-6 will strip the data on multiple storage devices and calculate and store double parity blocks across the storage devices. RAID-6 can sustain two drive failures. Other than that, it is the same as RAID-5 (data striping+single parity).

Data striping+mirroring: RAID-10 uses data striping and data mirroring at the same time. RAID-10 requires an even number of storage devices of the same size to be added to a single btrfs filesystem. The minimum number of storage devices you can add on a RAID-10 btrfs filesystem is 4. Half the storage device will be used for data striping, and the other half be used for mirroring the data of the first half of the storage devices (where data is striped).

xii) SSD awareness and optimizations: The btrfs filesystem is SSD aware and has some SSD optimization features. The btrfs filesystem also has TRIM/Discard support for SSD storage devices.

The TRIM feature can detect and mark data extents that are no longer used. Once the extends are marked, the btrfs filesystem can wipe them automatically so that the other files can use these data extents.

The Discard feature will remove all the data extends of the SSD. If you want to sell your SSD, this feature may come in handy.

xiii) Efficient incremental backup: Btrfs supports incremental backup. The first time you back up a btrfs filesystem, it takes a snapshot of the current filesystem. Then, any subsequent backups will be compared with the first snapshot, and only the changes will be stored on the disk. So, any subsequent backups will take less disk space, and backups will be faster.

xiv) Background scrub: It is a Btrfs filesystem process used to find and fix errors on the files that have redundant copies (multiple copies) stored in the Btrfs filesystem.

xv) Online filesystem defragmentation: I have explained earlier how the Btrfs Copy-on-Write filesystem works. Larges files are stored in multiple extents of the Btrfs filesystem. As you modify large files, the extents that are to be modified are copied to different free extents of the filesystem and modified there. So, the unmodified data extents are also kept in case it is required for filesystem recovery. This causes fragmentation (the data extents of a large file will not be continuous and will be scattered around the entire storage device) on the filesystem as large files are modified. Too much fragmentation negatively impacts the filesystem (makes the filesystem read/write operation slower).

To solve this problem, the btrfs filesystem supports online filesystem defragmentation. With online defragmentation, you don’t have to unmount the filesystem to defragment the filesystem. You can keep the filesystem up and running and still defragment it. Defragmentation will move file extents around the filesystem to keep the extents of the same large file as continuous as possible. Defragmentation improves filesystem performance.

xvi) Offline filesystem check: The Btrfs filesystem has many built-in tools that you can use to check for filesystem errors and fix them. You can also fix a broken Btrfs filesystem (that can’t be mounted) with these tools.

xvii) In-place conversion of existing Ext2/3/4 and ReiserFS filesystems: The Btrfs filesystem has a built-in utility btrfs-convert, which you can use to convert an existing Ext2/3/4 and ReiserFS filesystems to a Btrfs filesystem.

The Btrfs filesystem conversion program reads the metadata of an existing Ext2/3/4 ( or ReiserFS) filesystem, creates Btrfs metadata, and stores them on the filesystem. The filesystem keeps both the Btrfs and the Ext2/3/4 (or ReiserFS) metadata. The Btrfs filesystem points to the same file blocks used by the Ext2/3/4 (or ReiserFS) filesystem files. The existing filesystem and data blocks are kept untouched as Btrfs is a Copy-on-Write (CoW) filesystem. When a file is modified, the Btrfs filesystem copies the original data blocks to new free extents and modifies them there.

xviii) Seed devices: The Btrfs filesystem supports seed devices. You can create a read-only filesystem and use it as a template (seed device) to create other Btrfs filesystems. The benefit of doing that is that only the modified data will be written to the new filesystem. The original data (on the seed devices) will be kept as it is. This feature can be used to save a lot of disk space and data redundancy.

xix) Send/receive subvolume changes: The btrfs filesystem can send/receive subvolume changes. The Btrfs filesystem can send the incremental changes of a subvolume to another Btrfs filesystem (can also reside in another computer) that can receive the subvolume changes. This feature is used to take incremental backups of the Btrfs filesystem either locally or remotely. This method is faster and more efficient than rsync.

xx) Batch/Out of band deduplication: The Btrfs filesystem supports batch or out-of-band deduplication. The duplication takes place after a file is written to the filesystem. The Btrfs filesystem actively scans the entire filesystem for identical extents and keeps only one copy of each extent (removes redundant/duplicate extents). The same copy-on-write (CoW) principle is used for this task. Deduplication saves a lot of disk spaces.

xxi) Swapfile support: If you’re using Linux Kernel 5.0 or newer, you can create swapfiles on the Btrfs filesystem.

There are some limitations of Swapfile in a Btrfs filesystem:

– The swapfile must be allocated as NoCoW (not copy-on-write)

– The swapfile must not have any compression enabled.

Stability of Btrfs Filesystem:

The Btrfs filesystem is actively developed by the Btrfs team. Most of the features of the filesystem are stable at the time of this writing. Some of the advanced features are not yet stable enough for a production environment. The Btrfs team is working hard to solve these stability issues.

If you want to use the Btrfs filesystem on your production server, check the official Status – btrfs Wiki page to find out whether the filesystem features you need are stable enough for you or not. Also, make sure to run some tests before the final deployment of your Btrfs filesystem, and do remember to keep backups of your important data. Keeping backup is always important for production environments.

Future Replacement of Ext4 Filesystem:

Btrfs filesystem is being developed rapidly. The Btrfs development team also cares about the stability of the filesystem. So, they try their best to make it as stable as possible while developing the btrfs filesystem. Once the btrfs filesystem is fully developed, and all the features are stable enough, it may replace the Ext4 filesystem.

References:

[1] btrfs Wiki – https://btrfs.wiki.kernel.org/index.php/Main_Page
[2] BTRFS – The Kernel Tree Documentation – https://www.kernel.org/doc/html/latest/filesystems/btrfs.html
[3] BTRFS – Glossary – https://btrfs.wiki.kernel.org/index.php/Glossary
[4] Features of the “Btrfs” Filesystem – https://www.thegeekdiary.com/features-of-the-btrfs-filesystem/
[5] Comparison of Filesystems – https://en.wikipedia.org/wiki/Comparison_of_file_systems
[6] Btrfs design – btrfs Wiki – https://btrfs.wiki.kernel.org/index.php/Btrfs_design
[7] perhaps running out of inodes could be taken “more seriously”? – https://lwn.net/Articles/724522/
[8] Making a Btrfs read-only snapshots writable – https://markandruth.co.uk/2016/12/29/making-a-btrfs-read-only-snapshot-writable
[9] Data striping – https://en.wikipedia.org/wiki/Data_striping
[10] FAQ – btrfs wiki – https://btrfs.wiki.kernel.org/index.php/FAQ
[11] Standard RAID levels – https://en.wikipedia.org/wiki/Standard_RAID_levels
[12] Trim (computing) – https://en.wikipedia.org/wiki/Trim_(computing)
[13] Solid state drive – ArchWiki – https://wiki.archlinux.org/index.php/Solid_state_drive#TRIM
[14] Btrfsck – btrfs Wiki – https://btrfs.wiki.kernel.org/index.php/Btrfsck
[15] Conversion from Ext3/4 and ReiserFS – btrfs Wiki – https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3
[16] Incremental Backup – btrfs Wiki – https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
[17] Deduplication – btrfs Wiki – https://btrfs.wiki.kernel.org/index.php/Deduplication
[18] Status – btrfs Wiki – https://btrfs.wiki.kernel.org/index.php/Status

About the author

Shahriar Shovon

Shahriar Shovon

Freelancer & Linux System Administrator. Also loves Web API development with Node.js and JavaScript. I was born in Bangladesh. I am currently studying Electronics and Communication Engineering at Khulna University of Engineering & Technology (KUET), one of the demanding public engineering universities of Bangladesh.