Re: [PATCH v2 10/12] ext4: add basic fs-verity support
From: Eric Biggers
Date: Mon Nov 05 2018 - 20:11:47 EST
Hi Andreas,
On Mon, Nov 05, 2018 at 02:05:24PM -0700, Andreas Dilger wrote:
> On Nov 1, 2018, at 4:52 PM, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> >
> > From: Eric Biggers <ebiggers@xxxxxxxxxx>
> >
> > Add basic fs-verity support to ext4. fs-verity is a filesystem feature
> > that enables transparent integrity protection and authentication of
> > read-only files. It uses a dm-verity like mechanism at the file level:
> > a Merkle tree is used to verify any block in the file in log(filesize)
> > time. It is implemented mainly by helper functions in fs/verity/.
> > See Documentation/filesystems/fsverity.rst for details.
> >
> > This patch adds everything except the data verification hooks that will
> > needed in ->readpages().
> >
> > On ext4, enabling fs-verity on a file requires that the filesystem has
> > the 'verity' feature, e.g. that it was formatted with
> > 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
> > This requires e2fsprogs 1.44.4-2 or later.
> >
> > In ext4, we choose to retain the fs-verity metadata past the end of the
> > file rather than trying to move it into an external inode xattr, since
> > in practice keeping the metadata in-line actually results in the
> > simplest and most efficient implementation. One non-obvious advantage
> > of keeping the verity metadata in-line is that when fs-verity is
> > combined with fscrypt, the verity metadata naturally gets encrypted too;
> > this is actually necessary because it contains hashes of the plaintext.
>
> On the plus side, this means that the verity data will automatically be
> invalidated if the file is truncated or extended, but on the negative side
> it means that the verity Merkle tree needs to be recalculated for the
> entire file if e.g. the file is appended to.
>
> I guess the current implementation will generate the Merkle tree in
> userspace, but at some point it might be useful to generate it on-the-fly
> to have proper data integrity from the time of write (e.g. like ZFS)
> rather than only allowing it to be stored after the entire file is written?
>
> Storing the Merkle tree in a large xattr inode would allow this to change
> in the future rather than being stuck with the current implementation. We
> could encrypt the xattr data just as easily as the file data (which should
> be done anyway even for non-verity files to avoid leaking data), and having
> the verity attr keyed to the inode version/size/mime(?) would ensure the
> kernel knows it is stale if the inode is modified.
>
> I'm not going to stand on my head and block this implementation, I just
> thought it is worthwhile to raise these issues now rather than after it
> is a fait accompli.
>
That would actually be the least of the problems for adding write support.
Adding write support would require at least:
- A way to maintain consistency between the data and hashes, including all
levels of hashes, since corruption after a crash (especially of potentially
the entire file!) is unacceptable. The main options for solving this are data
journalling, copy-on-write, and log-structured volume. But it's very hard to
retrofit existing filesystems with new consistency mechanisms. Data
journalling can always be used, but is very slow.
- An on-disk format that allows dynamically growing/shrinking each level of the
Merkle tree; or, using a different authenticated dictionary structure, such as
an authenticated skiplist rather than a Merkle tree. This would drastically
increase the complexity over a regular Merkle tree.
Compare it to dm-verity vs. dm-integrity. dm-verity is read-only and very
simple; the kernel just uses a Merkle tree that is generated by userspace.
On the other hand, dm-integrity supports writes but is slow, much more complex,
and doesn't even actually do full-device authentication since it authenticates
each sector independently, i.e. there is no Merkle tree.
I don't think it would make sense for the same device-mapper target to support
these quite different use cases. And the same general concepts apply at the
filesystem level; for these reasons and others (note that per-block checksums
like btrfs and ZFS wouldn't need a Merkle tree), write support is very
intentionally outside the scope of fs-verity.
So I think any arguments for doing things differently in fs-verity need to be
made in the context of read-only files.
Thanks,
Eric