Re: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices

From: Williams, Dan J
Date: Thu Jun 25 2015 - 14:35:12 EST


On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> > From: Toshi Kani <toshi.kani@xxxxxx>
> >
> > ACPI NFIT table has System Physical Address Range Structure entries that
> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> > set in the flags.
> >
> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> > and set it to a new numa_node field of nd_region_desc, which is then
> > conveyed to the nd_region device.
> >
> > The device core arranges for btt and namespace devices to inherit their
> > node from their parent region.
> >
> > Signed-off-by: Toshi Kani <toshi.kani@xxxxxx>
> > [djbw: move set_dev_node() from region 'probe' to 'create']
>
> Sorry, I failed to mention other issue, which led me call set_dev_node()
> in probe. nd_async_device_register() calls device_add(), which does:
>
> /* use parent numa_node */
> if (parent)
> set_dev_node(dev, dev_to_node(parent));
>
> and overwrites numa_node to -1. Since region's parent is ndbusN, we
> cannot set numa_node to the parent. So, I had to set it in probe.

In general, I still don't like leaving it up to ->probe() which is
within its rights to fail and not set the node. How about the following
that moves it to the bus uevent code? Should get triggered before probe
so the numa_node is valid before userspace is ever notified about the
device.

device_add() does:

kobject_uevent(&dev->kobj, KOBJ_ADD);
bus_probe_device(dev);

...so I think we're good, agree? I also added a missing init of
ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.

8<-----
Subject: libnvdimm: Set numa_node to NVDIMM devices

From: Toshi Kani <toshi.kani@xxxxxx>

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.

Signed-off-by: Toshi Kani <toshi.kani@xxxxxx>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
---
arch/x86/kernel/pmem.c | 1 +
drivers/acpi/nfit.c | 6 ++++++
drivers/nvdimm/bus.c | 6 ++++++
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/region_devs.c | 1 +
include/linux/libnvdimm.h | 1 +
6 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 0f4ef472ab9e..64f90f53bb85 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -67,6 +67,7 @@ static __init int register_e820_pmem(void)
memset(&ndr_desc, 0, sizeof(ndr_desc));
ndr_desc.res = &res;
ndr_desc.attr_groups = e820_pmem_region_attribute_groups;
+ ndr_desc.numa_node = NUMA_NO_NODE;
if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
goto err;
}
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 1f6f1b1a54f4..d96c8fe974dd 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1392,6 +1392,12 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
ndr_desc->res = &res;
ndr_desc->provider_data = nfit_spa;
ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
+ if (spa->flags & ACPI_NFIT_PROXIMITY_VALID)
+ ndr_desc->numa_node = acpi_map_pxm_to_online_node(
+ spa->proximity_domain);
+ else
+ ndr_desc->numa_node = NUMA_NO_NODE;
+
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
struct nd_mapping *nd_mapping;
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index ec59f1f26d95..205344643852 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -48,6 +48,12 @@ static int to_nd_device_type(struct device *dev)

static int nvdimm_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
{
+ /*
+ * Ensure that region devices always have their numa node set as
+ * early as possible.
+ */
+ if (is_nd_pmem(dev) || is_nd_blk(dev))
+ set_dev_node(dev, to_nd_region(dev)->numa_node);
return add_uevent_var(env, "MODALIAS=" ND_DEVICE_MODALIAS_FMT,
to_nd_device_type(dev));
}
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index b870de9add79..72c26461835d 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -96,7 +96,7 @@ struct nd_region {
u16 ndr_mappings;
u64 ndr_size;
u64 ndr_start;
- int id, num_lanes, ro;
+ int id, num_lanes, ro, numa_node;
void *provider_data;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8f8c7ea485f1..55b424f6ba0d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -736,6 +736,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
nd_region->nd_set = ndr_desc->nd_set;
nd_region->num_lanes = ndr_desc->num_lanes;
nd_region->ro = ro;
+ nd_region->numa_node = ndr_desc->numa_node;
ida_init(&nd_region->ns_ida);
dev = &nd_region->dev;
dev_set_name(dev, "region%d", nd_region->id);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index dc799a29ed1a..30b3deaafd51 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -89,6 +89,7 @@ struct nd_region_desc {
struct nd_interleave_set *nd_set;
void *provider_data;
int num_lanes;
+ int numa_node;
};

struct nvdimm_bus;