Re: [RFC PATCH 02/10] fs-verity: add data verification hooks for ->readpages()

From: Gao Xiang
Date: Sat Aug 25 2018 - 02:32:36 EST

Hi Eric,

Thanks for your detailed reply.

My english is not quite well, I could not type logically and quickly like you and could use some words improperly,
I just want to express my personal concern, please understand, thanks. :)

On 2018/8/25 12:16, Eric Biggers wrote:
> Hi Gao,
> On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
>> Hi,
>> On 2018/8/25 0:16, Eric Biggers wrote:
>>> +/**
>>> + * fsverity_verify_page - verify a data page
>>> + *
>>> + * Verify a page that has just been read from a file against that file's Merkle
>>> + * tree. The page is assumed to be a pagecache page.
>>> + *
>>> + * Return: true if the page is valid, else false.
>>> + */
>>> +bool fsverity_verify_page(struct page *data_page)
>>> +{
>>> + struct inode *inode = data_page->mapping->host;
>>> + const struct fsverity_info *vi = get_fsverity_info(inode);
>>> + struct ahash_request *req;
>>> + bool valid;
>>> +
>>> + req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
>>> + if (unlikely(!req))
>>> + return false;
>>> +
>>> + valid = verify_page(inode, vi, req, data_page);
>>> +
>>> + ahash_request_free(req);
>>> +
>>> + return valid;
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
>>> +
>>> +/**
>>> + * fsverity_verify_bio - verify a 'read' bio that has just completed
>>> + *
>>> + * Verify a set of pages that have just been read from a file against that
>>> + * file's Merkle tree. The pages are assumed to be pagecache pages. Pages that
>>> + * fail verification are set to the Error state. Verification is skipped for
>>> + * pages already in the Error state, e.g. due to fscrypt decryption failure.
>>> + */
>>> +void fsverity_verify_bio(struct bio *bio)
>>> +{
>>> + struct inode *inode = bio_first_page_all(bio)->mapping->host;
>>> + const struct fsverity_info *vi = get_fsverity_info(inode);
>>> + struct ahash_request *req;
>>> + struct bio_vec *bv;
>>> + int i;
>>> +
>>> + req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
>>> + if (unlikely(!req)) {
>>> + bio_for_each_segment_all(bv, bio, i)
>>> + SetPageError(bv->bv_page);
>>> + return;
>>> + }
>>> +
>>> + bio_for_each_segment_all(bv, bio, i) {
>>> + struct page *page = bv->bv_page;
>>> +
>>> + if (!PageError(page) && !verify_page(inode, vi, req, page))
>>> + SetPageError(page);
>>> + }
>>> +
>>> + ahash_request_free(req);
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);
>> Out of curiosity, I quickly scanned the fs-verity source code and some minor question out there....
>> If something is wrong, please point out, thanks in advance...
>> My first question is that 'Is there any way to skip to verify pages in a bio?'
>> I am thinking about
>> If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification.
> Pages below i_size are verified, pages above are not.
> With my patches, ext4 and f2fs won't actually submit pages in both areas in the
> same bio, and they won't call the fs-verity verification function for bios in
> the data area. But even if they did, there's also a check in verify_page() that

I think you mean the hash area?
Yes, I understand your design. It is a wonderful job for ext4/f2fs for now as Ted said.

> skips the verification if the page is above i_size.

I think it could not be as simple as you said for all cases.

If some fs submits contiguous access with different MAPPING (something like mixed FILE_MAPPING and META_MAPPING),
their page->index are actually unreliable(could be logical page index for FILE_MAPPING,and physical page index for META_MAPPING),
and data are organized by design in multi bios for a fs-specific use (such as compresssion).

You couldn't do such verification `if the page is above i_size' and it could be hard to integrate somehow.

>> The second question is related to the first question --- 'Is there any way to verify a partial page?'
>> Take scalability into consideration, some files could be totally inlined or partially inlined in metadata.
>> Is there any way to deal with them in per-file approach? at least --- support for the interface?
> Well, one problem is that inline data has its own separate I/O path; see
> ext4_readpage_inline() and f2fs_read_inline_data(). So it would be a large
> effort to support features like encryption and verity which require
> postprocessing after reads, and probably not worthwhile especially for verity
> which is primarily intended for large files.

Yes, for the current user ext4 and f2fs, it is absolutely wonderful.

I have to admit I am curious about Google fs-verity roadmap for the future Android
(I have to identify whether it is designed to replace dm-verity, currently I think is no)

since it is very important whether our EROFS should support fs-verity or not in the near future...

I could give some EROFS use case if you have some time to discuss.

EROFS uses a more aggressive inline approach, which means it not only inline data for small files.
It is designed to inline the last page, which are reasonable small (eg. only a few byte) to inline for all files, eg.

IN META_MAPPING blk-aligned
+--------------------------------------| +--------+--------+ +----------+.......+
|inode A+inlined-last data.. inode B...| | page 0 | page 1 | ... | page n-1 . page n.
+--------------------------------------+ +--------+--------+ +----------+.......+

In priciple, this approach could be also used for read-write file systems to save more storage space.
I think it is still easy for uncompressed file if you do the zero padding as you said below.

But if considering _compression_.....especially compression in VLE, I think it should not rely on `bio' directly, because,
1) endio with compressed data rather than FILE_MAPPING plain data, these pages which could from META_MAPPING
(for caching compressed page on purpose) or FILE_MAPPING(for decompressing in-place to save redundant META_MAPPING memory).

I think it should be decompress at first and then fs-verity, but there could be more filepages other than compresssed pages joined
(eg. 128kb->32kb, we submit 8 pages but decompress end with 32 pages), it should not be the original bio any more...
(actually I think it is not the bio concept anymore...)

2) EROFS VLE is more complicated, we could end a bio with a compressed page but decompress a partial file page, eg.
... | compressed page X |compressed page X+1 |
end of bio Y/ bio Y+1
\ | /
| plain data (file page)|
which means a bio could only decompress partial data of a page, the page could be Uptodate by two bios rather than one,
I have no idea how to fs-verity like this...

`it could call fsverity after assembling the page in the page cache.` as Ted said in that case.

> A somewhat separate question is whether the zero padding to a block boundary
> after i_size, before the Merkle tree begins, is needed. The answer is yes,
> since mixing data and metadata in the same page would cause problems. First,
> userspace would be able to mmap the page and see some of the metadata rather
> than zeroes. That's not a huge problem, but it breaks the standard behavior.
> Second, any page containing data cannot be set Uptodate until it's been
> verified. So, a special case would be needed to handle reading the part of the
> metadata that's located in a data page.

Yes, after I just thinked over, I think there should be a zero padding to a block boundary
as you said due to Uptodate and mmap scenerio if you directly use its inode(file) mapping for verification.

>> At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
>> rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better.....hmmm... :(
> In theory it would be a much cleaner design to store verity metadata separately
> from the data. But the Merkle tree can be very large. For example, a 1 GB file
> using SHA-512 would have a 16.6 MB Merkle tree. So the Merkle tree can't be an
> extended attribute, since the xattrs API requires xattrs to be small (<= 64 KB),
> and most filesystems further limit xattr sizes in their on-disk format to as
> little as 4 KB. Furthermore, even if both of these limits were to be increased,
> the xattrs functions (both the syscalls, and the internal functions that
> filesystems have) are all based around getting/setting the entire xattr value.
> Also when used with fscrypt, we want the Merkle tree and fsverity_descriptor to
> be encrypted, so they doesn't leak plaintext hashes. And we want the Merkle
> tree to be paged into memory, just like the file contents, to take advantage of
> the usual Linux memory management.
> What we really need is *streams*, like NTFS has. But the filesystems we're
> targetting don't support streams, nor does the Linux syscall interface have any
> API for accessing streams, nor does the VFS support them.
> Adding streams support to all those things would be a huge multi-year effort,
> controversial, and almost certainly not worth it just for fs-verity.
> So simply storing the verity metadata past i_size seems like the best solution
> for now.
> That being said, in the future we could pretty easily swap out the calls to
> read_mapping_page() with something else if a particular filesystem wanted to
> store the metadata somewhere else. We actually even originally had a function
> ->read_metadata_page() in the filesystem's fsverity_operations, but it turned
> out to be unnecessary and I replaced it with directly calling
> read_mapping_page(), but it could be changed back at any time.

OK, I got it.

I have to look into that and think over again. Thanks for your reply again in the end. :)

Gao Xiang

> - Eric