Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page

From: Darrick J. Wong
Date: Wed Sep 23 2020 - 01:01:06 EST


On Wed, Sep 23, 2020 at 03:48:59AM +0100, Matthew Wilcox wrote:
> On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> > On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > > Size the uptodate array dynamically to support larger pages in the
> > > > > page cache. With a 64kB page, we're only saving 8 bytes per page today,
> > > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > > per page. Add a few debugging assertions.
> > > > >
> > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > > > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > >
> > > > Some syscall fuzzing will trigger this on powerpc:
> > > >
> > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > > >
> > > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > > > io.c:78 iomap_page_release+0x250/0x270
> > >
> > > Well, I'm glad it triggered. That warning is:
> > > WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> > > PageUptodate(page));
> > > so there was definitely a problem of some kind.
> > >
> > > truncate_cleanup_page() calls
> > > do_invalidatepage() calls
> > > iomap_invalidatepage() calls
> > > iomap_page_release()
> > >
> > > Is this the first warning? I'm wondering if maybe there was an I/O error
> > > earlier which caused PageUptodate to get cleared again. If it's easy to
> > > reproduce, perhaps you could try something like this?
> > >
> > > +void dump_iomap_page(struct page *page, const char *reason)
> > > +{
> > > + struct iomap_page *iop = to_iomap_page(page);
> > > + unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> > > +
> > > + dump_page(page, reason);
> > > + if (iop)
> > > + printk("iop:reads %d writes %d uptodate %*pb\n",
> > > + atomic_read(&iop->read_bytes_pending),
> > > + atomic_read(&iop->write_bytes_pending),
> > > + nr_blocks, iop->uptodate);
> > > + else
> > > + printk("iop:none\n");
> > > +}
> > >
> > > and then do something like:
> > >
> > > if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > > dump_iomap_page(page, NULL);
> >
> > This:
> >
> > [ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c
> > [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2"
> > [ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru)
> > [ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8
> > [ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000
> > [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> > [ 1683.158506][T164965] iop:none
>
> Oh, I'm a fool. This is after the call to detach_page_private() so
> page->private is NULL and we don't get the iop dumped.
>
> Nevertheless, this is interesting. Somehow, the page is marked Uptodate,
> but the bitmap is deemed not full. There are three places where we set
> an iomap page Uptodate:
>
> 1. if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
> SetPageUptodate(page);
>
> 2. if (page_has_private(page))
> iomap_iop_set_range_uptodate(page, off, len);
> else
> SetPageUptodate(page);
>
> 3. BUG_ON(page->index);
> ...
> SetPageUptodate(page);
>
> It can't be #2 because the page has an iop. It can't be #3 because the
> page->index is not 0. So at some point in the past, the bitmap was full.
>
> I don't think it's possible for inode->i_blksize to change, and you
> aren't running with THPs, so it's definitely not possible for thp_size()
> to change. So i_blocks_per_page() isn't going to change.
>
> We seem to have allocated enough memory for ->iop because that's also
> based on i_blocks_per_page().
>
> I'm out of ideas. Maybe I'll wake up with a better idea in the morning.
> I've been trying to reproduce this on x86 with a 1kB block size
> filesystem, and haven't been able to yet. Maybe I'll try to setup a
> powerpc cross-compilation environment tomorrow.

FWIW I managed to reproduce it with the following fstests configuration
on a 1k block size fs on a x86 machinE:

SECTION -- -no-sections-
FSTYP -- xfs
MKFS_OPTIONS -- -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024
MOUNT_OPTIONS -- -o usrquota,grpquota,prjquota
HOST_OPTIONS -- local.config
CHECK_OPTIONS -- -g auto
XFS_MKFS_OPTIONS -- -bsize=4096
TIME_FACTOR -- 1
LOAD_FACTOR -- 1
TEST_DIR -- /mnt
TEST_DEV -- /dev/sde
SCRATCH_DEV -- /dev/sdd
SCRATCH_MNT -- /opt
OVL_UPPER -- ovl-upper
OVL_LOWER -- ovl-lower
OVL_WORK -- ovl-work
KERNEL -- 5.9.0-rc4-djw

The kernel is more or less iomap-for-next.

--D