Re: [PATCH v3 2/2] dax: add a sysfs knob to control memmap_on_memory behavior

From: Dan Williams
Date: Mon Dec 11 2023 - 20:01:03 EST


Verma, Vishal L wrote:
> On Tue, 2023-12-12 at 08:30 +0800, Huang, Ying wrote:
> > Vishal Verma <vishal.l.verma@xxxxxxxxx> writes:
> >
> > > Add a sysfs knob for dax devices to control the memmap_on_memory setting
> > > if the dax device were to be hotplugged as system memory.
> > >
> > > The default memmap_on_memory setting for dax devices originating via
> > > pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
> > > preserve legacy behavior. For dax devices via CXL, the default is on.
> > > The sysfs control allows the administrator to override the above
> > > defaults if needed.
> > >
> > > Cc: David Hildenbrand <david@xxxxxxxxxx>
> > > Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> > > Cc: Dave Jiang <dave.jiang@xxxxxxxxx>
> > > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > > Cc: Huang Ying <ying.huang@xxxxxxxxx>
> > > Tested-by: Li Zhijian <lizhijian@xxxxxxxxxxx>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > > Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>
> > > Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
> > > ---
> > >  drivers/dax/bus.c                       | 47 +++++++++++++++++++++++++++++++++
> > >  Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++
> > >  2 files changed, 64 insertions(+)
> > >
> > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> > > index 1ff1ab5fa105..2871e5188f0d 100644
> > > --- a/drivers/dax/bus.c
> > > +++ b/drivers/dax/bus.c
> > > @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev,
> > >  }
> > >  static DEVICE_ATTR_RO(numa_node);
> > >  
> > > +static ssize_t memmap_on_memory_show(struct device *dev,
> > > +                                    struct device_attribute *attr, char *buf)
> > > +{
> > > +       struct dev_dax *dev_dax = to_dev_dax(dev);
> > > +
> > > +       return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
> > > +}
> > > +
> > > +static ssize_t memmap_on_memory_store(struct device *dev,
> > > +                                     struct device_attribute *attr,
> > > +                                     const char *buf, size_t len)
> > > +{
> > > +       struct device_driver *drv = dev->driver;
> > > +       struct dev_dax *dev_dax = to_dev_dax(dev);
> > > +       struct dax_region *dax_region = dev_dax->region;
> > > +       struct dax_device_driver *dax_drv = to_dax_drv(drv);
> > > +       ssize_t rc;
> > > +       bool val;
> > > +
> > > +       rc = kstrtobool(buf, &val);
> > > +       if (rc)
> > > +               return rc;
> > > +
> > > +       if (dev_dax->memmap_on_memory == val)
> > > +               return len;
> > > +
> > > +       device_lock(dax_region->dev);
> > > +       if (!dax_region->dev->driver) {
> > > +               device_unlock(dax_region->dev);
> > > +               return -ENXIO;
> > > +       }
> >
> > I think that it should be OK to write to "memmap_on_memory" if no driver
> > is bound to the device.  We just need to avoid to write to it when kmem
> > driver is bound.
>
> Oh this is just a check on the region driver, not for a dax driver
> being bound to the device. It's the same as what things like
> align_store(), size_store() etc. do for dax device reconfiguration.
>
> That said, it might be okay to remove this check, as this operation
> doesn't change any attributes of the dax region (the other interfaces I
> mentioned above can affect regions, so we want to lock the region
> device). If removing the check, we'd drop the region lock acquisition
> as well.
>
> Dan, any thoughts on this?

Since this is a dev_dax attribute then this would have already been
synchronously shutdown when dax_region->dev->driver transitioned to
NULL. I.e. region unbind causes dev_dax deletion.

However, there's a different issue here as dev->driver was referenced
without the device_lock().

Additionally, I think this function also makes it clear that device lock
flavor of guard() would be useful:

DEFINE_GUARD(dev, struct device *, device_lock(_T), device_unlock(_T))

...then I would expect something like:

guard(dev)(dev);
if (dev_dax->memmap_on_memory != val && dev->driver &&
to_dax_drv(dev->driver)->type == DAXDRV_KMEM_TYPE)
return -EBUSY;
dev_dax->memmap_on_memory = val;
return len;

...maybe with some temp variables to reduce the derefence chain, buy you
get the idea. Only prevent changes while the device is active under
kmem.