Re: [PATCH] pci: derive nearby CPUs from device's instead of bus'NUMA information

From: Jesse Barnes
Date: Mon May 11 2009 - 17:54:54 EST


On Thu, 7 May 2009 10:51:36 +0200
Andreas Herrmann <andreas.herrmann3@xxxxxxx> wrote:

> On Mon, Apr 20, 2009 at 01:03:41PM -0700, Jesse Barnes wrote:
> > On Mon, 20 Apr 2009 10:47:47 +0200
> > Andreas Herrmann <andreas.herrmann3@xxxxxxx> wrote:
> >
> > > On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > > > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@xxxxxxx>
> > > > wrote:
> > > > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > > > {
> > > > > Â Â Â Âif (dev->numa_node == -1)
> > > > > Â Â Â Â Â Â Â Âreturn cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > > > >
> > > > > Â Â Â Âreturn cpumask_of_node(dev_to_node(dev));
> > > > > }
> > > > >
> > > > > ? This would work fine in all cases.
> > >
> > > Yes, I think so. That's the general solution w/o additional
> > > "ifdefing".
> > >
> > > > you are right, dev_to_node(dev) could return -1 on 64bit, if
> > > > there is no memory on that node.
> > >
> > > Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> > >
> > > During initialization the struct device's numa_node is set to -1
> > > and later on the information is inherited from the parent
> > > numa_node.
> > >
> > > So what do I miss?
> >
> > I like the idea of cpumask_of_pcidev(), but it seems like
> > cpumask_of_pcibus should return the same value. So if the node is
> > unassigned or "equadistant" (there's code that treats -1 as both I
> > think), cpumask_of_pcibus should figure out what the nearest CPUs
> > are and return that, right?
>
> Usually this is true.
>
> But there is one special case.
>
> Northbridge functions of AMD CPUs appear to be on bus 0 device 24-31
> (each having 4 or 5 functions depending on the CPU family).
>
> Requests to those devices (e.g. reading config space) are handled by
> the processor(s) themselves and aren't routed to the PCI bus.
> At most such requests are routed to another processor (node) if the
> request is for a northbridge function of a different processor.
>
> See 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd for some additional info.
>
> That is why I think that using cpumask_of_pcidev should have
> precedence over cpumask_of_pcibus. (numa_node information of a PCI
> device can be fixed up and then differ from node information of the
> PCI bus .)

So we're making the generic code more confusing to handle an AMD
special case? Are the functions you mention likely to have drivers
that allocate memory or need cpumask_of_pcibus info? I guess there are
no nice solutions given the above split of the device across busses (in
a logical sense), so the cleanups Ingo suggested may be the best we can
do.

--
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/