Re: [PATCH v1] virtio_pmem: populate numa information

From: Dan Williams
Date: Wed Oct 26 2022 - 17:50:08 EST


Pankaj Gupta wrote:
> > > > Compute the numa information for a virtio_pmem device from the memory
> > > > range of the device. Previously, the target_node was always 0 since
> > > > the ndr_desc.target_node field was never explicitly set. The code for
> > > > computing the numa node is taken from cxl_pmem_region_probe in
> > > > drivers/cxl/pmem.c.
> > > >
> > > > Signed-off-by: Michael Sammler <sammler@xxxxxxxxxx>
> > > > ---
> > > > drivers/nvdimm/virtio_pmem.c | 11 +++++++++--
> > > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > > > index 20da455d2ef6..a92eb172f0e7 100644
> > > > --- a/drivers/nvdimm/virtio_pmem.c
> > > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > > @@ -32,7 +32,6 @@ static int init_vq(struct virtio_pmem *vpmem)
> > > > static int virtio_pmem_probe(struct virtio_device *vdev)
> > > > {
> > > > struct nd_region_desc ndr_desc = {};
> > > > - int nid = dev_to_node(&vdev->dev);
> > > > struct nd_region *nd_region;
> > > > struct virtio_pmem *vpmem;
> > > > struct resource res;
> > > > @@ -79,7 +78,15 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
> > > > dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > >
> > > > ndr_desc.res = &res;
> > > > - ndr_desc.numa_node = nid;
> > > > +
> > > > + ndr_desc.numa_node = memory_add_physaddr_to_nid(res.start);
> > > > + ndr_desc.target_node = phys_to_target_node(res.start);
> > > > + if (ndr_desc.target_node == NUMA_NO_NODE) {
> > > > + ndr_desc.target_node = ndr_desc.numa_node;
> > > > + dev_dbg(&vdev->dev, "changing target node from %d to %d",
> > > > + NUMA_NO_NODE, ndr_desc.target_node);
> > > > + }
> > >
> > > As this memory later gets hotplugged using "devm_memremap_pages". I don't
> > > see if 'target_node' is used for fsdax case?
> > >
> > > It seems to me "target_node" is used mainly for volatile range above
> > > persistent memory ( e.g kmem driver?).
> > >
> > I am not sure if 'target_node' is used in the fsdax case, but it is
> > indeed used by the devdax/kmem driver when hotplugging the memory (see
> > 'dev_dax_kmem_probe' and '__dax_pmem_probe').
>
> Yes, but not currently for FS_DAX iiuc.

The target_node is only used by the dax_kmem driver. In the FSDAX case
the memory (persistent or otherwise) is mapped behind a block-device.
That block-device has affinity to a CPU initiator, but that memory does
not itself have any NUMA affinity or identity as a target.

So:

block-device NUMA node == closest CPU initiator node to the device

dax-device target node == memory only NUMA node target, after onlining