Re: [PATCH 3/3] Add NumaChip quirk

From: Steffen Persvold
Date: Tue Oct 25 2011 - 13:33:52 EST


On 10/25/2011 19:15, Bjorn Helgaas wrote:
[+cc linux-pci]

[]

The conflict we refer to in the patch is that since Linux thinks we have
those windows assigned to us, we get conflicts later on with real devices :

[ 5.887856] pnp 00:0e: disabling [mem 0x00000000-0x0009ffff] because it
overlaps 0000:00:1a.0 BAR 0 [mem 0x00000000-0x000fffff]
[ 5.899525] pnp 00:0e: disabling [mem 0x000c0000-0x000cffff] because it
overlaps 0000:00:1a.0 BAR 0 [mem 0x00000000-0x000fffff]
[ 5.911002] pnp 00:0e: disabling [mem 0x000e0000-0x000fffff] because it
overlaps 0000:00:1a.0 BAR 0 [mem 0x00000000-0x000fffff]

Yeah, this is gross, and this is definitely something Linux is doing
wrong. We don't have a consistent way of marking PCI BARs as
"disabled," so every zero-valued BAR seems to conflict with PNP
devices. Typically there are motherboard devices like your 00:0e that
reserve regions of low memory.

Lots of machines complain like this, not just NumaChip, and there's no
real ill effect. We say we're disabling a PNP device resource, but we
don't actually evaluate an _SRS method to tell the BIOS to do
anything. So I think we complain about the conflict but don't do
anything else.

Ah, ok. I didn't dive into it, so I didn't see if anything bad(tm) really happened when that PNP device got "disabled" resources.


I guess technically, the Linux PCI bus probing code should check the Command
register (offset 0x4) to see if MemorySpace is enabled (which in our case it
won't be) before checking the BAR registers.

The question is how we handle a device with MemorySpace disabled. In
most cases, I think we want to assign BAR resources to it so that if a
driver claims the device, we can enable MemorySpace and the device
will work. If the BIOS leaves MemorySpace disabled and Linux doesn't
assign BAR space at boot-time, we may be stuck because in general we
can't assign resources dynamically. Dynamic assignment might require
moving other devices, enlarging bridge windows, etc., which Linux
currently doesn't support.

Yes, I didn't want to open that can of worms :)


NumaChip sounds like an exception because you know you never care
about using those BARs. But I'm curious -- it looks like Linux didn't
even try to assign resources to them. I thought something in the
pci_assign_unassigned_resources() path would have tried to do
something with them. If we *did* assign resources to those BARs, I
assume nothing would break, since there's no driver that actually uses
them. Right?


Correct, the BARs are there and if something sensible were written to them (and MemorySpace was enabled in the Command register) NumaChip *would* respond to mmio accesses to that address range. However, those BARs are only for memory mapped configuration register (CSR) access which can be accessed in a "reserved" address range anyway which is what we refer to as global CSR space. This is what the 2 other sub-patches are using for talking to the NumaChip, and if someone would write any driver at some point it would most likely be tied to that type of access anyway.

The BARs can only be used for "local" CSR space which isn't that useful anyway so we don't bother assigning anything to it (or don't care if anything is). Besides, in a cluster with 1000s of NumaChip nodes all connected together (with different PCI segments etc. etc.) you really don't want to assign BARs anyway since you already have the global CSR space.

With this background, would you agree that it makes sense to still have the quirk or would you go about and solve it in a different way ?

Cheers,
--
Steffen Persvold, Chief Architect NumaChip
Numascale AS - www.numascale.com
Tel: +47 92 49 25 54 Skype: spersvold
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/