Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

From: Dan Williams
Date: Thu Sep 16 2021 - 15:04:09 EST


On Thu, Sep 16, 2021 at 12:12 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Wed, Sep 15, 2021 at 01:27:47PM -0700, Dan Williams wrote:
> > > Yeah, Christoph suggested that we make the clearing operation explicit
> > > in a related thread a few weeks ago:
> > > https://lore.kernel.org/linux-fsdevel/YRtnlPERHfMZ23Tr@xxxxxxxxxxxxx/
> >
> > That seemed to be tied to a proposal to plumb it all the way out to an
> > explicit fallocate() mode, not make it a silent side effect of
> > pwrite().
>
> Yes.
>
> > >
> > > Each of the dm drivers has to add their own ->clear_poison operation
> > > that remaps the incoming (sector, len) parameters as appropriate for
> > > that device and then calls the lower device's ->clear_poison with the
> > > translated parameters.
> > >
> > > This (AFAICT) has already been done for dax_zero_page_range, so I sense
> > > that Dan is trying to save you a bunch of code plumbing work by nudging
> > > you towards doing s/dax_clear_poison/dax_zero_page_range/ to this series
> > > and then you only need patches 2-3.
> >
> > Yes, but it sounds like Christoph was saying don't overload
> > dax_zero_page_range(). I'd be ok splitting the difference and having a
> > new fallocate clear poison mode map to dax_zero_page_range()
> > internally.
>
> That was my gut feeling. If everyone feels 100% comfortable with
> zeroingas the mechanism to clear poisoning I'll cave in. The most
> important bit is that we do that through a dedicated DAX path instead
> of abusing the block layer even more.

...or just rename dax_zero_page_range() to dax_reset_page_range()?
Where reset == "zero + clear-poison"?

> > > > BTW, our customer doesn't care about creating dax volume thru DM, so.
> > >
> > > They might not care, but anything going upstream should work in the
> > > general case.
> >
> > Agree.
>
> I'm really worried about both patartitions on DAX and DM passing through
> DAX because they deeply bind DAX to the block layer, which is just a bad
> idea. I think we also need to sort that whole story out before removing
> the EXPERIMENTAL tags.

I do think it was a mistake to allow for DAX on partitions of a pmemX
block-device.

DAX-reflink support may be the opportunity to start deprecating that
support. Only enable DAX-reflink for direct mounting on /dev/pmemX
without partitions (later add dax-device direct mounting), change
DAX-experimental warning to a deprecation notification for DAX on
DM/partitions, continue to fail / never fix DAX-reflink for
DM/partitions, direct people to use namespace provisioning for
sub-divisions of PMEM capacity, and finally look into adding
concatenation and additional software striping support to the new CXL
region creation facility.