Re: [PATCH v6 4/7] dax: add support for fsync/msync

From: Ross Zwisler
Date: Tue Jan 05 2016 - 10:50:48 EST


On Tue, Jan 05, 2016 at 12:13:46PM +0100, Jan Kara wrote:
> On Sun 03-01-16 10:13:06, Dan Williams wrote:
> > On Wed, Dec 23, 2015 at 11:39 AM, Ross Zwisler
> > <ross.zwisler@xxxxxxxxxxxxxxx> wrote:
> > > To properly handle fsync/msync in an efficient way DAX needs to track dirty
> > > pages so it is able to flush them durably to media on demand.
> > >
> > > The tracking of dirty pages is done via the radix tree in struct
> > > address_space. This radix tree is already used by the page writeback
> > > infrastructure for tracking dirty pages associated with an open file, and
> > > it already has support for exceptional (non struct page*) entries. We
> > > build upon these features to add exceptional entries to the radix tree for
> > > DAX dirty PMD or PTE pages at fault time.
> > >
> > > Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> >
> > I'm hitting the following report with the ndctl dax test [1] on
> > next-20151231. I bisected it to
> > commit 3cb108f941de "dax-add-support-for-fsync-sync-v6". I'll take a
> > closer look tomorrow, but in case someone can beat me to it, here's
> > the back-trace:
> >
> > ------------[ cut here ]------------
> > kernel BUG at fs/inode.c:497!
>
> I suppose this is the check that mapping->nr_exceptional is zero, isn't it?
> Hum, I don't see how that could happen given we call
> truncate_inode_pages_final() just before the clear_inode() call which
> removes all the exceptional entries from the radix tree. And there's not
> much room for a race during umount... Does the radix tree really contain
> any entry or is it an accounting bug?
>
> Honza

I think this is a bug with the existing way that we handle PMD faults. The
issue is that the PMD path doesn't properly remove radix tree entries for zero
pages covering holes. The PMD path calls unmap_mapping_range() to unmap the
range out of the struct address_space, but it is missing a call to
truncate_inode_pages_range() or similar to clear out those entries in the
radix tree. Up until now we didn't notice, we just had an orphaned entry in
the radix tree, but with my code we then find the page entry in the radix
tree when handling a PMD fault, we remove it and add in a PMD entry. This
causes us to be off on both our mapping->nrpages and mapping->nrexceptional
counts.

In the PTE path we properly remove the pages from the radix tree when
upgrading from a hole to a real DAX entry via the delete_from_page_cache()
call, which eventually calls page_cache_tree_delete().

I'm working on a fix now (and making sure all the above is correct).

- Ross
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/