Re: [PATCH 1/2][RESEND] x86/pci/amd: Restoreearly_fill_mp_bus_to_node

From: Andreas Herrmann
Date: Tue May 08 2012 - 03:44:01 EST


On Mon, May 07, 2012 at 09:44:16AM -0700, Bjorn Helgaas wrote:
> On Mon, May 7, 2012 at 12:35 AM, Andreas Herrmann
> <andreas.herrmann3@xxxxxxx> wrote:
> > On Fri, May 04, 2012 at 10:35:05AM -0600, Bjorn Helgaas wrote:
> >> On Fri, May 4, 2012 at 7:03 AM, Andreas Herrmann
> >> <andreas.herrmann3@xxxxxxx> wrote:
> >> > On Wed, May 02, 2012 at 11:33:17AM -0600, Bjorn Helgaas wrote:
> >> >> On Fri, Apr 27, 2012 at 8:36 AM, Andreas Herrmann
> >> >> <andreas.herrmann3@xxxxxxx> wrote:
> >> >> >
> >> >> > Once upon a time this function was overloaded with quirky stuff to fix
> >> >> > resource detection on systems w/ _CRS defects (seems that some Sun and
> >> >> > HP systems were affected).
> >> >> >
> >> >> > See commit 30a18d6c3f1e774de656ebd8ff219d53e2ba4029
> >> >> > (x86: multi pci root bus with different io resource range, on 64-bit)
> >> >> >
> >> >> > Restore the old function and thus decouple it from the quirk that is
> >> >> > CPU family specific (e.g. it won't work on AMD family 15h CPUs). BTW,
> >> >> > I assume that the _CRS stuff is working on current systems.
> >> >> >
> >> >> > This is required to properly initilize the numa_node information of
> >> >> > existing PCI busses and associated devices.
> >> >>
> >> >> I applied some of Yinghai's patches that also touch this area. ÂCan
> >> >> you refresh these so they apply on top of my "next" branch
> >> >> (git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next)?
> >> >
> >> > Arrgh, will adapt my patch and resend it (asap).
> >> >
> >> >> Can you also be more specific about what these patches fix?
> >> >
> >> >> My understanding is that amd_bus.c (1) sets NUMA info with
> >> >> set_mp_bus_to_node() and (2) figures out MMIO and I/O port apertures,
> >> >> which are only used when blind probing and when ignoring _CRS.
> >> >>
> >> >> It seems like the main change in this patch is that we skip (2)
> >> >> completely when family >= 0x11, and I don't understand what that could
> >> >> fix.
> >> >
> >> > The patch restores a very old function that was used to detect the
> >> > nearest node for a PCI bus, so yes it's used to do (1). IMHO this
> >> > function was totally screwed up with Yinghai's code to do (2). It
> >> > seems that Sun has (had?) some systems where (2) was req'd. I don't
> >> > care about this part. But I'd like to do (1) on all AMD CPU NUMA
> >> > systems.
> >>
> >> Thanks for the explanation. ÂBut I'm afraid I'm still confused.
> >>
> >> First, it sounds like you're trying to change the way we do part (1),
> >> i.e., the set_mp_bus_to_node() calls, but I think the effect of your
> >> patch is to stop doing part (2) in some cases.
> >>
> >> Second, I am pretty sure that the current early_fill_mp_bus_info()
> >> (before your patch) does the exact same set_mp_bus_to_node() calls as
> >> your early_fill_mp_bus_to_node() does.
> >
> >
> > I want to do (1) on all AMD CPUs that might be used in NUMA systems.
> >
> > What's done for (2) is very specific to certain AMD CPU families --
> > some of the register accesses are wrong/incomplete for newer AMD
> > CPUs. Furhtermore _CRS should provide the required info. I really
> > don't want to extend all the quirky stuff in (2) for future AMD CPUs.
>
> I'm all in favor of limiting part (2) to older AMD CPUs. I certainly
> don't want to maintain it for future CPUs.
>
> >> Finally, on all systems with ACPI, the set_mp_bus_to_node() call in
> >> pci_acpi_scan_root() should be doing what you need. ÂIn fact, that
> >> call happens later, so it should be overwriting the information filled
> >> in by amd_bus.c. ÂIf there's something wrong in this ACPI path, the
> >> most likely cause is a BIOS defect, such as Âa missing _PXM method on
> >> the PNP0A03/0A08 host bridge device.
> >
> > Good point. I'll check what's wrong in this ACPI path.
>
> I hope you find something, especially if it's a bug in the Linux code
> that interprets the NUMA info. Then we could fix that and limit both
> parts to older CPUs.

Simply, there is no _PXM object for the host bridge devices. At least
on the systems that I checked.

I'll try to find out whether this is sort of "common BIOS practice" on
AMD boxes and how to avoid that in the future.

However it seems that a fix in Linux is appropriate for existing
systems.


Andreas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/