Re: [PATCH v4 5/5] nvdimm: Schedule device registration on node local to the device

From: Dan Williams
Date: Thu Sep 20 2018 - 22:46:17 EST


On Thu, Sep 20, 2018 at 6:34 PM Alexander Duyck
<alexander.h.duyck@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 9/20/2018 5:36 PM, Dan Williams wrote:
> > On Thu, Sep 20, 2018 at 5:26 PM Alexander Duyck
> > <alexander.h.duyck@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On 9/20/2018 3:59 PM, Dan Williams wrote:
> >>> On Thu, Sep 20, 2018 at 3:31 PM Alexander Duyck
> >>> <alexander.h.duyck@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> This patch is meant to force the device registration for nvdimm devices to
> >>>> be closer to the actual device. This is achieved by using either the NUMA
> >>>> node ID of the region, or of the parent. By doing this we can have
> >>>> everything above the region based on the region, and everything below the
> >>>> region based on the nvdimm bus.
> >>>>
> >>>> One additional change I made is that we hold onto a reference to the parent
> >>>> while we are going through registration. By doing this we can guarantee we
> >>>> can complete the registration before we have the parent device removed.
> >>>>
> >>>> By guaranteeing NUMA locality I see an improvement of as high as 25% for
> >>>> per-node init of a system with 12TB of persistent memory.
> >>>>
> >>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx>
> >>>> ---
> >>>> drivers/nvdimm/bus.c | 19 +++++++++++++++++--
> >>>> 1 file changed, 17 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> >>>> index 8aae6dcc839f..ca935296d55e 100644
> >>>> --- a/drivers/nvdimm/bus.c
> >>>> +++ b/drivers/nvdimm/bus.c
> >>>> @@ -487,7 +487,9 @@ static void nd_async_device_register(void *d, async_cookie_t cookie)
> >>>> dev_err(dev, "%s: failed\n", __func__);
> >>>> put_device(dev);
> >>>> }
> >>>> +
> >>>> put_device(dev);
> >>>> + put_device(dev->parent);
> >>>
> >>> Good catch. The child does not pin the parent until registration, but
> >>> we need to make sure the parent isn't gone while were waiting for the
> >>> registration work to run.
> >>>
> >>> Let's break this reference count fix out into its own separate patch,
> >>> because this looks to be covering a gap that may need to be
> >>> recommended for -stable.
> >>
> >> Okay, I guess I can do that.
> >>
> >>>
> >>>>
> >>>> static void nd_async_device_unregister(void *d, async_cookie_t cookie)
> >>>> @@ -504,12 +506,25 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie)
> >>>>
> >>>> void __nd_device_register(struct device *dev)
> >>>> {
> >>>> + int node;
> >>>> +
> >>>> if (!dev)
> >>>> return;
> >>>> +
> >>>> dev->bus = &nvdimm_bus_type;
> >>>> + get_device(dev->parent);
> >>>> get_device(dev);
> >>>> - async_schedule_domain(nd_async_device_register, dev,
> >>>> - &nd_async_domain);
> >>>> +
> >>>> + /*
> >>>> + * For a region we can break away from the parent node,
> >>>> + * otherwise for all other devices we just inherit the node from
> >>>> + * the parent.
> >>>> + */
> >>>> + node = is_nd_region(dev) ? to_nd_region(dev)->numa_node :
> >>>> + dev_to_node(dev->parent);
> >>>
> >>> Devices already automatically inherit the node of their parent, so I'm
> >>> not understanding why this is needed?
> >>
> >> That doesn't happen until you call device_add, which you don't call
> >> until nd_async_device_register. All that has been called on the device
> >> up to now is device_initialize which leaves the node at NUMA_NO_NODE.
> >
> > Ooh, yeah, missed that. I think I'd prefer this policy to moved out to
> > where we set the dev->parent before calling __nd_device_register, or
> > at least a comment here about *why* we know region devices are special
> > (i.e. because the nd_region_desc specified the node at region creation
> > time).
> >
>
> Are you talking about pulling the scheduling out or just adding a node
> value to the nd_device_register call so it can be set directly from the
> caller?

I was thinking everywhere we set dev->parent before registering, also
set the node...

> If you wanted what I could do is pull the set_dev_node call from
> nvdimm_bus_uevent and place it in nd_device_register. That should stick
> as the node doesn't get overwritten by the parent if it is set after
> device_initialize. If I did that along with the parent bit I was already
> doing then all that would be left to do in is just use the dev_to_node
> call on the device itself.

...but this is even better.