Re: [PATCH] pci: derive nearby CPUs from device's instead of bus'NUMA information

From: Andreas Herrmann
Date: Tue Jun 09 2009 - 01:48:20 EST


On Mon, May 11, 2009 at 02:54:23PM -0700, Jesse Barnes wrote:
> On Thu, 7 May 2009 10:51:36 +0200
> Andreas Herrmann <andreas.herrmann3@xxxxxxx> wrote:
> > On Mon, Apr 20, 2009 at 01:03:41PM -0700, Jesse Barnes wrote:
> > > On Mon, 20 Apr 2009 10:47:47 +0200
> > > Andreas Herrmann <andreas.herrmann3@xxxxxxx> wrote:
> > > > On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > > > > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@xxxxxxx>
> > > > > wrote:
> > > > > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > > > > {
> > > > > >        if (dev->numa_node == -1)
> > > > > >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > > > > >
> > > > > >        return cpumask_of_node(dev_to_node(dev));
> > > > > > }
> > > > > >
> > > > > > ? This would work fine in all cases.
> > > >
> > > > Yes, I think so. That's the general solution w/o additional
> > > > "ifdefing".
> > > >
> > > > > you are right, dev_to_node(dev) could return -1 on 64bit, if
> > > > > there is no memory on that node.
> > > >
> > > > Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> > > >
> > > > During initialization the struct device's numa_node is set to -1
> > > > and later on the information is inherited from the parent
> > > > numa_node.
> > > >
> > > > So what do I miss?
> > >
> > > I like the idea of cpumask_of_pcidev(), but it seems like
> > > cpumask_of_pcibus should return the same value. So if the node is
> > > unassigned or "equadistant" (there's code that treats -1 as both I
> > > think), cpumask_of_pcibus should figure out what the nearest CPUs
> > > are and return that, right?
> >
> > Usually this is true.
> >
> > But there is one special case.
> >
> > Northbridge functions of AMD CPUs appear to be on bus 0 device 24-31
> > (each having 4 or 5 functions depending on the CPU family).
> >
> > Requests to those devices (e.g. reading config space) are handled by
> > the processor(s) themselves and aren't routed to the PCI bus.
> > At most such requests are routed to another processor (node) if the
> > request is for a northbridge function of a different processor.
> >
> > See 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd for some additional info.
> >
> > That is why I think that using cpumask_of_pcidev should have
> > precedence over cpumask_of_pcibus. (numa_node information of a PCI
> > device can be fixed up and then differ from node information of the
> > PCI bus .)
>
> So we're making the generic code more confusing to handle an AMD
> special case?

Yes.

> Are the functions you mention likely to have drivers
> that allocate memory or need cpumask_of_pcibus info?

Rarely or better say not at the moment.

> I guess there are no nice solutions given the above split of the
> device across busses (in a logical sense), so the cleanups Ingo
> suggested may be the best we can do.

Yes, I think so.


Regards,
Andreas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/