Re: [RFC PATCH v4 1/8] iomap: zeroing needs to be pagecache aware

From: Dave Chinner
Date: Sun Jun 02 2024 - 18:23:14 EST


On Fri, May 31, 2024 at 06:11:25AM -0700, Christoph Hellwig wrote:
> On Wed, May 29, 2024 at 05:51:59PM +0800, Zhang Yi wrote:
> > XXX: how do we detect a iomap containing a cow mapping over a hole
> > in iomap_zero_iter()? The XFS code implies this case also needs to
> > zero the page cache if there is data present, so trigger for page
> > cache lookup only in iomap_zero_iter() needs to handle this case as
> > well.
>
> If there is no data in the page cache and either a whole or unwritten
> extent it really should not matter what is in the COW fork, a there
> obviously isn't any data we could zero.
>
> If there is data in the page cache for something that is marked as
> a hole in the srcmap, but we have data in the COW fork due to
> COW extsize preallocation we'd need to zero it, but as the
> xfs iomap ops don't return a separate srcmap for that case we
> should be fine. Or am I missing something?

If the data extent is a hole, xfs_buffered_write_iomap_begin()
doesn't even check the cow fork for extents if IOMAP_ZERO is being
done. Hence if there is a pending COW extent that extends over a
data fork hole (cow fork preallocation can do that, right?), then we
may have data in the page cache over an unwritten extent in the COW
fork.

This code:

/* We never need to allocate blocks for zeroing or unsharing a hole. */
if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) &&
imap.br_startoff > offset_fsb) {
xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
goto out_unlock;
}

The comment, IMO, indicates the issue here: we're not going to
allocate blocks in IOMAP_ZERO, but we do need to map anything that
might contain page cache data for the IOMAP_ZERO case. If "data
hole, COW unwritten, page cache dirty" can exist as the comment in
xfs_setattr_size() implies, then this code is broken and needs
fixing.

I don't know what that fix looks like yet - I suspect that all we
need to do for IOMAP_ZERO is to return the COW extent in the srcmap,
and then the zeroing code should do the right thing if it's an
unwritten COW extent...

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx