Re: [PATCH 01/19] dax: remove block device dependencies

From: Dan Williams
Date: Tue Jan 14 2020 - 15:39:14 EST


On Tue, Jan 14, 2020 at 12:31 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> On Thu, Jan 09, 2020 at 12:03:01PM -0800, Dan Williams wrote:
> > On Thu, Jan 9, 2020 at 3:27 AM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Tue 07-01-20 10:49:55, Dan Williams wrote:
> > > > On Tue, Jan 7, 2020 at 10:33 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > > > > W.r.t partitioning, bdev_dax_pgoff() seems to be the pain point where
> > > > > dax code refers back to block device to figure out partition offset in
> > > > > dax device. If we create a dax object corresponding to "struct block_device"
> > > > > and store sector offset in that, then we could pass that object to dax
> > > > > code and not worry about referring back to bdev. I have written some
> > > > > proof of concept code and called that object "dax_handle". I can post
> > > > > that code if there is interest.
> > > >
> > > > I don't think it's worth it in the end especially considering
> > > > filesystems are looking to operate on /dev/dax devices directly and
> > > > remove block entanglements entirely.
> > > >
> > > > > IMHO, it feels useful to be able to partition and use a dax capable
> > > > > block device in same way as non-dax block device. It will be really
> > > > > odd to think that if filesystem is on /dev/pmem0p1, then dax can't
> > > > > be enabled but if filesystem is on /dev/mapper/pmem0p1, then dax
> > > > > will work.
> > > >
> > > > That can already happen today. If you do not properly align the
> > > > partition then dax operations will be disabled. This proposal just
> > > > extends that existing failure domain to make all partitions fail to
> > > > support dax.
> > >
> > > Well, I have some sympathy with the sysadmin that has /dev/pmem0 device,
> > > decides to create partitions on it for whatever (possibly misguided)
> > > reason and then ponders why the hell DAX is not working? And PAGE_SIZE
> > > partition alignment is so obvious and widespread that I don't count it as a
> > > realistic error case sysadmins would be pondering about currently.
> > >
> > > So I'd find two options reasonably consistent:
> > > 1) Keep status quo where partitions are created and support DAX.
> > > 2) Stop partition creation altogether, if anyones wants to split pmem
> > > device further, he can use dm-linear for that (i.e., kpartx).
> > >
> > > But I'm not sure if the ship hasn't already sailed for option 2) to be
> > > feasible without angry users and Linus reverting the change.
> >
> > Christoph? I feel myself leaning more and more to the "keep pmem
> > partitions" camp.
> >
> > I don't see "drop partition support" effort ending well given the long
> > standing "ext4 fails to mount when dax is not available" precedent.
> >
> > I think the next least bad option is to have a dax_get_by_host()
> > variant that passes an offset and length pair rather than requiring a
> > later bdev_dax_pgoff() to recall the offset. This also prevents
> > needing to add another dax-device object representation.
>
> I am wondering what's the conclusion on this. I want to this to make
> progress in some direction so that I can make progress on virtiofs DAX
> support.

I think we should at least try to delete the partition support and see
if anyone screams. Have a module option to revert the behavior so
people are not stuck waiting for the revert to land, but if it stays
quiet then we're in a better place with that support pushed out of the
dax core.