Re: åå: Re: [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink

From: Ruan Shiyang
Date: Thu Jun 04 2020 - 22:31:22 EST




On 2020/6/5 äå9:30, Dave Chinner wrote:
On Thu, Jun 04, 2020 at 07:51:07AM -0700, Darrick J. Wong wrote:
On Thu, Jun 04, 2020 at 03:37:42PM +0800, Ruan Shiyang wrote:


On 2020/4/28 äå2:43, Dave Chinner wrote:
On Tue, Apr 28, 2020 at 06:09:47AM +0000, Ruan, Shiyang wrote:

å 2020/4/27 20:28:36, "Matthew Wilcox" <willy@xxxxxxxxxxxxx> åé:

On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote:
This patchset is a try to resolve the shared 'page cache' problem for
fsdax.

In order to track multiple mappings and indexes on one page, I
introduced a dax-rmap rb-tree to manage the relationship. A dax entry
will be associated more than once if is shared. At the second time we
associate this entry, we create this rb-tree and store its root in
page->private(not used in fsdax). Insert (->mapping, ->index) when
dax_associate_entry() and delete it when dax_disassociate_entry().

Do we really want to track all of this on a per-page basis? I would
have thought a per-extent basis was more useful. Essentially, create
a new address_space for each shared extent. Per page just seems like
a huge overhead.

Per-extent tracking is a nice idea for me. I haven't thought of it
yet...

But the extent info is maintained by filesystem. I think we need a way
to obtain this info from FS when associating a page. May be a bit
complicated. Let me think about it...

That's why I want the -user of this association- to do a filesystem
callout instead of keeping it's own naive tracking infrastructure.
The filesystem can do an efficient, on-demand reverse mapping lookup
from it's own extent tracking infrastructure, and there's zero
runtime overhead when there are no errors present.

Hi Dave,

I ran into some difficulties when trying to implement the per-extent rmap
tracking. So, I re-read your comments and found that I was misunderstanding
what you described here.

I think what you mean is: we don't need the in-memory dax-rmap tracking now.
Just ask the FS for the owner's information that associate with one page
when memory-failure. So, the per-page (even per-extent) dax-rmap is
needless in this case. Is this right?

Right. XFS already has its own rmap tree.

*nod*

Based on this, we only need to store the extent information of a fsdax page
in its ->mapping (by searching from FS). Then obtain the owners of this
page (also by searching from FS) when memory-failure or other rmap case
occurs.

I don't even think you need that much. All you need is the "physical"
offset of that page within the pmem device (e.g. 'this is the 307th 4k
page == offset 1257472 since the start of /dev/pmem0') and xfs can look
up the owner of that range of physical storage and deal with it as
needed.

Right. If we have the dax device associated with the page that had
the failure, then we can determine the offset of the page into the
block device address space and that's all we need to find the owner
of the page in the filesystem.

Note that there may actually be no owner - the page that had the
fault might land in free space, in which case we can simply zero
the page and clear the error.

OK. Thanks for pointing out.


So, a fsdax page is no longer associated with a specific file, but with a
FS(or the pmem device). I think it's easier to understand and implement.

Effectively, yes. But we shouldn't need to actually associate the
page with anything at the filesystem level because it is already
associated with a DAX device at a lower level via a dev_pagemap.
The hardware page fault already runs thought this code
memory_failure_dev_pagemap() before it gets to the DAX code, so
really all we need to is have that function pass us the page, offset
into the device and, say, the struct dax_device associated with that
page so we can get to the filesystem superblock we can then use for
rmap lookups on...


OK. I was just thinking about how can I execute the FS rmap search from the memory-failure. Thanks again for pointing out. :)


--
Thanks,
Ruan Shiyang.

Cheers,

Dave.