Re: [PATCH] pci: derive nearby CPUs from device's instead of bus'NUMA information

From: Andreas Herrmann
Date: Thu May 07 2009 - 04:52:24 EST


On Mon, Apr 20, 2009 at 01:03:41PM -0700, Jesse Barnes wrote:
> On Mon, 20 Apr 2009 10:47:47 +0200
> Andreas Herrmann <andreas.herrmann3@xxxxxxx> wrote:
>
> > On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
> > > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > > {
> > > >        if (dev->numa_node == -1)
> > > >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > > >
> > > >        return cpumask_of_node(dev_to_node(dev));
> > > > }
> > > >
> > > > ? This would work fine in all cases.
> >
> > Yes, I think so. That's the general solution w/o additional
> > "ifdefing".
> >
> > > you are right, dev_to_node(dev) could return -1 on 64bit, if there
> > > is no memory on that node.
> >
> > Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> >
> > During initialization the struct device's numa_node is set to -1 and
> > later on the information is inherited from the parent numa_node.
> >
> > So what do I miss?
>
> I like the idea of cpumask_of_pcidev(), but it seems like
> cpumask_of_pcibus should return the same value. So if the node is
> unassigned or "equadistant" (there's code that treats -1 as both I
> think), cpumask_of_pcibus should figure out what the nearest CPUs are
> and return that, right?

Usually this is true.

But there is one special case.

Northbridge functions of AMD CPUs appear to be on bus 0 device 24-31
(each having 4 or 5 functions depending on the CPU family).

Requests to those devices (e.g. reading config space) are handled by
the processor(s) themselves and aren't routed to the PCI bus.
At most such requests are routed to another processor (node) if the
request is for a northbridge function of a different processor.

See 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd for some additional info.

That is why I think that using cpumask_of_pcidev should have
precedence over cpumask_of_pcibus. (numa_node information of a PCI
device can be fixed up and then differ from node information of the
PCI bus .)


Regards,
Andreas

--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/