RE: [Patch v2] PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
From: Long Li
Date: Wed Jan 19 2022 - 00:56:45 EST
> Subject: RE: [Patch v2] PCI: hv: Fix NUMA node assignment when kernel boots
> with custom NUMA topology
>
> From: longli@xxxxxxxxxxxxxxxxx <longli@xxxxxxxxxxxxxxxxx> Sent: Tuesday,
> January 18, 2022 6:45 PM
> >
> > When kernel boots with a NUMA topology with some NUMA nodes offline,
> > the PCI driver should only set an online NUMA node on the device. This
> > can happen during KDUMP where some NUMA nodes are not made online by
> the KDUMP kernel.
> >
> > This patch also fixes the case where kernel is booting with "numa=off".
> >
> > Signed-off-by: Long Li <longli@xxxxxxxxxxxxx>
>
> It seems like adding a "Fixes:" tag would be appropriate.
Okay, sending v3 with "Fixes:".
>
> >
> > Change from v1:
> > Use numa_map_to_online_node() to assign a node to device (suggested by
> > Michael Kelly <mikelley@xxxxxxxxxxxxx>)
> >
> > ---
> > drivers/pci/controller/pci-hyperv.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index 6c9efeefae1b..c7519add6f13 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -2130,7 +2130,15 @@ static void hv_pci_assign_numa_node(struct
> > hv_pcibus_device *hbus)
> > continue;
> >
> > if (hv_dev->desc.flags &
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> > - set_dev_node(&dev->dev, hv_dev-
> >desc.virtual_numa_node);
> > + /*
> > + * The kernel may boot with some NUMA nodes offline
> > + * (e.g. in a KDUMP kernel) or with NUMA disabled via
> > + * "numa=off". In those cases, adjust the host provided
> > + * NUMA node to a valid NUMA node used by the kernel.
> > + */
> > + set_dev_node(&dev->dev,
> > + numa_map_to_online_node(
> > + hv_dev->desc.virtual_numa_node));
>
> Double-check me, but I think this approach has a flaw in that
> numa_map_to_online_node() doesn't check the input node for being out-of-
> range. The call to node_online() uses the input node to index into the
> node_states bitmap (of type nodemask_t), and could go off the end of the
> bitmap if the input node isn't validated to be less than MAX_NUMNODES (or
> nr_node_ids).
Indeed, your concern is correct when "numa=off" is set.
>
> Michael
>
> >
> > put_pcichild(hv_dev);
> > }
> > --
> > 2.25.1