Bcachefs - encryption, fsck, and more

From: Kent Overstreet
Date: Wed Mar 15 2017 - 20:09:06 EST

It's been far too long since the last announcement - lots of stuff has been
happening. The biggest milestone has been all the breaking on disk format
changes finally landing, but there's been lots of other stuff going on, too.

On the subject of the breaking on disk format changes - there's an excellent
chance this'll be the last breaking change, so if you're thinking about trying
out bcachefs this is an excellent time. Also, if you have a filesystem in the
old format, code to read your filesystem is available in the bcachefs-v0 braches
of both linux-bcache and bcache-tools.

More information on getting started with bcachefs is available on the wiki:

What all has changed since the last announcement:

Related to the on disk format changes, we have...
- Encryption

We now have whole filesystem encryption - and this is modern authenticated
encrypted, using ChaCha20 and Poly1305. Bcachefs's encryption isn't a direct
competitor to ext4's encryption - unlike ext4, we can't currently encrypt
only part of the filesystem, and then mount and use the rest of the
filesystem without providing the encryption key. It's more of a better
dm-crypt - block layer encryption is somewhat of a pile of hacks [1] and it's
not possible to do authenticated encryption at the block layer, but it is in
a copy on write filesystem.

In my (relatively brief) performance testing, bcachefs's encryption performs
for me almost identically to dm-crypt (which I was surprised by, given that
they're using completely different ciphers).

Before you go out and switch to bcachefs encryption though - please be aware,
the encryption design and code has seen some outside review but it really
does need more before I'd trust it with anything critical.

[1] https://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

- Backup superblocks

This has been badly needed since our superblocks are now often > 4k and thus
torn writes leading to checksum failures are a real issue.

- New inode format

The new inode format is both more compact and more easily extensible than the
old one - average real world inode size is now 50-60 bytes. You know what
makes a filesystem feel fast? Being able to fit all your metadata in ram :)

- Lots of small changes for better support for multiple devices and replication

Multiple device support (including caching/tiering) is getting to be pretty
robust and usable (and people are sucessfully using it for their root
filesystems - for awhile now, actually). The tooling is getting better, the
main priority at this point needs to be documentation.

For replication (i.e. raid1/10), the core functionality all works - you can
create a replicated filesystem, write data to it, and take one of the drives
offline while the filesystem is in use - it keeps working, and you can keep
writing data to it. However, there's still quite a few things that need to be
finished before it will actually be useful for protecting your data - we need
to add better tracking for which drives have data and how that data is
replicated (so we know whether we can take a drive offline or mount without
it without losing data), as well as replication aware disk space accounting
and rereplication/scrubbing. But it's coming.

Most of the activity lately has actually been happening in the userspace
tooling, though:

We now have a userspace fsck: we've actually had most of fsck implemented for
quite awhile, but it was implemented in the kernel so it was only possible to
run it at mount time (it runs by default on every mount, because I err towards
paranoia). The new userspace fsck is much more convenient though - it takes all
the normal options (e.g. -n for dry run) and is able to prompt if it finds an

We didn't get a whole new fsck tool that runs in userspace - what's actually new
here is that I wrote a shim layer to build almost all the bcachefs code in
userspace as part of bcache-tools, which uses it as a library.

This is really cool, and it's made it easy to write some other very useful
tools/subcommands: One is "bcache dump", which takes a filesystem and dumps all
the metadata to a sparse qcow2 image. This is really useful for debugging - if
your bcachefs filesystem gets into a bad state and fsck isn't able to fix it,
dump the metadata and send it to me and I'll debug it from that. We've already
used it for exactly that - and for me the developer, it was a hell of a lot
easier to debug and teach fsck to fix that particular issue that way instead of
having to either get remote access, or debug by sending him patches and waiting
for him to test them. So of the recent changes this might be the one I'm
happiest about :)

We can also now migrate filesystems to bcachefs in place! The bcache migrate
command takes an existing filesystem, fallocates a big file in it, creates a new
filesystem (in userspace) on the block device but using only the space reserved
by that file it fallocated - and then walks the contents of the original
filesystem creating pointers to all your existing data.

You can then mount that new filesystem and verify that everything is correct
without overwriting anything in the existing filesystem (by passing mount the
offset where bcache migrate put the superblock) - and you can even mount both
the old and the new filesystems at the same time (use mount -o noexcl when
mounting the bcachefs filesystem) and use rsync --itemize-changes to verify that
the filesystems really are identical, which is how I test it.

Aside from all that, there's been numerous fixes and performance improvements -
we're still looking for benchmarks/workloads where bcachefs lags other
filesystems, and as we find them they get fixed. Good rigorous performance
testing with new benchmarks is always appreciated.


If you're interested in helping out - come join us in the #bcache IRC channel,
on OFTC. We're trying to get a new website together and get some more
documentation written, so if you have skills in either of those areas (us kernel
programmers don't really do web design) your help would be greatly appreciated.

And as usual, I still need more funding - if you can chip in that's always
greatly appreciated - https://www.patreon.com/bcachefs - or if you're a company
that might be interested in making use of bcachefs, contact me.