Re: [PATCH v12 08/20] dax,ext2: Replace the XIP page fault handler with the DAX page fault handler

From: Andrew Morton
Date: Tue Jan 13 2015 - 17:47:59 EST


On Tue, 13 Jan 2015 16:53:34 -0500 Matthew Wilcox <willy@xxxxxxxxxxxxxxx> wrote:

> /*
> * Lock ordering in mm:
> *
> * inode->i_mutex (while writing or truncating, not reading or faulting)
> * mm->mmap_sem
>
> > > In the worst case, the file still has blocks
> > > + * allocated past the end of the file.
> > > + */
> > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > + if (unlikely(vmf->pgoff >= size)) {
> > > + error = -EIO;
> > > + goto out;
> > > + }
> >
> > How does this play with holepunching? Checking i_size won't work there?
>
> It doesn't. But the same problem exists with non-DAX files too, and
> when I pointed it out, it was met with a shrug from the crowd. I saw a
> patch series just recently that fixes it for XFS, but as far as I know,
> btrfs and ext4 still don't play well with pagefault vs hole-punch races.

What are the user-visible effects of the race?

> > > + memset(&bh, 0, sizeof(bh));
> > > + block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
> > > + bh.b_size = PAGE_SIZE;
> >
> > ah, there.
> >
> > PAGE_SIZE varies a lot between architectures. What are the
> > implications of this>?
>
> At the moment, you can only do DAX for blocksizes that are equal to
> PAGE_SIZE. That's a restriction that existed for the previous XIP code,
> and I haven't fixed it all for DAX yet. I'd like to, but it's not high on
> my list of things to fix. Since these are in-mmeory filesystems, there's
> not likely to be high demand to move the filesystem between machines.

hm, I guess not.

This means that our users will need to mkfs their filesystems with
blocksize==pagesize. The "error: unsupported blocksize for dax" printk
should get the message across, but a mention in
Documentation/filesystems/dax.txt's "Shortcomings" section wouldn't
hurt.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/