Re: [PATCH RFC v2 14/18] dax/region: Support DAX device creation on dynamic DAX regions

From: Ira Weiny
Date: Wed Sep 06 2023 - 00:35:22 EST


Jonathan Cameron wrote:
> On Mon, 28 Aug 2023 22:21:05 -0700
> Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
>
> > Dynamic Capacity (DC) DAX regions have a list of extents which define
> > the memory of the region which is available.
> >
> > Now that DAX region extents are fully realized support DAX device
> > creation on dynamic regions by adjusting the allocation algorithms
> > to account for the extents. Remember also references must be held on
> > the extents until the DAX devices are done with the memory.
> >
> > Redefine the region available size to include only extent space. Reuse
> > the size allocation algorithm by defining sub-resources for each extent
> > and limiting range allocation to those extents which have space. Do not
> > support direct mapping of DAX devices on dynamic devices.
> >
> > Enhance DAX device range objects to hold references on the extents until
> > the DAX device is destroyed.
> >
> > NOTE: At this time all extents within a region are created equally.
> > However, labels are associated with extents which can be used with
> > future DAX device labels to group which extents are used.
>
> This sound like a bad place to start to me as we are enabling something
> that is probably 'wrong' in the long term as opposed to just not enabling it
> until we have appropriate support.

I disagree. I don't think the kernel should be trying to process tags at
the lower level.

> I'd argue better to just reject any extents with different labels for now.

Again I disagree. This is less restrictive. The idea is that labels can
be changed such that user space can ultimately decided which extents
should be used for which devices. I have some work on that already.
(Basically it becomes quite easy to assign a label to a dax device and
have the extent search use only dax extents which match that label.)

>
> As this is an RFC meh ;)

Sure! ;-)

>
> Whilst this looks fine to me, I'm rather out of my depth wrt to the DAX
> side of things so take that with a pinch of salt.

NP

>
> Jonathan
>
>
> >
> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> > ---
> > drivers/dax/bus.c | 157 +++++++++++++++++++++++++++++++++++++++-------
> > drivers/dax/cxl.c | 44 +++++++++++++
> > drivers/dax/dax-private.h | 5 ++
> > 3 files changed, 182 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> > index ea7ae82b4687..a9ea6a706702 100644
> > --- a/drivers/dax/bus.c
> > +++ b/drivers/dax/bus.c
>
> ...
>
>
> > @@ -1183,7 +1290,7 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr,
> > to_alloc = range_len(&r);
> > if (alloc_is_aligned(dev_dax, to_alloc))
> > rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start,
> > - to_alloc);
> > + to_alloc, NULL);
> > device_unlock(dev);
> > device_unlock(dax_region->dev);
> >
> > @@ -1400,8 +1507,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
> > device_initialize(dev);
> > dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
> >
> > + dev_WARN_ONCE(parent, is_dynamic(dax_region) && data->size,
> > + "Dynamic DAX devices are created initially with 0 size");
>
> dev_info() maybe more appropriate?

Unless I'm mistaken this can happen from userspace but only if something
in the code changes later. Because the dax layer is trying to support
non-dynamic regions (which dynamic may be a bad name), I was worried that
the creation with a size might slip through...

> Is this common enough that we need the
> _ONCE?

once is because it could end up spamming a log later if something got
coded up wrong.

>
>
> > rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start,
> > - data->size);
> > + data->size, NULL);
> > if (rc)
> > goto err_range;
> >
> > diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> > index 44cbd28668f1..6394a3531e25 100644
> > --- a/drivers/dax/cxl.c
> > +++ b/drivers/dax/cxl.c
> ...
>
>
> > static int cxl_dax_region_create_extent(struct dax_region *dax_region,
> > struct cxl_dr_extent *cxl_dr_ext)
> > {
> > @@ -45,11 +80,20 @@ static int cxl_dax_region_create_extent(struct dax_region *dax_region,
> > /* device manages the dr_extent on success */
> > kref_init(&dr_extent->ref);
> >
> > + rc = dax_region_add_resource(dax_region, dr_extent,
> > + cxl_dr_ext->hpa_offset,
> > + cxl_dr_ext->hpa_length);
> > + if (rc) {
> > + kfree(dr_extent);
>
> goto for these and single unwinding block?

Yea. Done.

>
> > + return rc;
> > + }
> > +
> > rc = dax_region_ext_create_dev(dax_region, dr_extent,
> > cxl_dr_ext->hpa_offset,
> > cxl_dr_ext->hpa_length,
> > cxl_dr_ext->label);
> > if (rc) {
> > + dax_region_rm_resource(dr_extent);
> > kfree(dr_extent);
> as above.

Done.

>
> > return rc;
> > }
> > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> > index 250babd6e470..ad73b53aa802 100644
> > --- a/drivers/dax/dax-private.h
> > +++ b/drivers/dax/dax-private.h
> > @@ -44,12 +44,16 @@ struct dax_region {
> > /*
> > * struct dax_region_extent - extent data defined by the low level region
> > * driver.
> > + * @region: cache of dax_region
> > + * @res: cache of resource tree for this extent
> > * @private_data: lower level region driver data
>
> Not sure 'lower level' is well defined here. Is "region driver data"
> not enough?

For me it was not. I'll have to sleep on it. Technically there is no
dax_region 'driver' but only a dax_region device.

>
> > * @ref: track number of dax devices which are using this extent
> > * @get: get reference to low level data
> > * @put: put reference to low level data
> > */
> > struct dax_region_extent {
> > + struct dax_region *region;
> > + struct resource *res;
> > void *private_data;
> > struct kref ref;
> > void (*get)(struct dax_region_extent *dr_extent);
> > @@ -131,6 +135,7 @@ struct dev_dax {
> > unsigned long pgoff;
> > struct range range;
> > struct dax_mapping *mapping;
> > + struct dax_region_extent *dr_extent;
>
> Huh. Seems that ranges is in the kernel doc but not the
> bits that make that up. Maybe good to add the docs
> whilst here?

oh. sure. took me a couple of reads of this sentence.

I'm going to think on this too.

Ira

>
> > } *ranges;
> > };
> >
> >
>