[PATCH 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources

From: Logan Gunthorpe
Date: Thu Feb 14 2019 - 12:00:37 EST

One odd quirk of PLX switches is that their upstream bridge port has
256K of space allocated behind its BAR0 (most other bridge
implementations do not report any BAR space). The lspci for such device
looks like:

04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
(prog-if 00 [Normal decode])
Physical Slot: 1
Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
I/O behind bridge: 00002000-00003fff
Memory behind bridge: 90000000-909fffff
Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff
Kernel driver in use: pcieport

It's not clear what the purpose of the memory at 0x90a00000 is, and
currently the kernel never actually uses it for anything. In most cases,
it's safely ignored and does not cause a problem.

However, when the kernel assigns the resource addresses (with the
pci=realloc command line parameter, for example) it can inadvertently
disable the struct resource corresponding to the bar. When this happens,
lspci will report this memory as ignored:

Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]

This is because the kernel reports a zero start address and zero flags
in the corresponding sysfs resource file and in /proc/bus/pci/devices.
Investigation with 'lspci -x', however shows the bios-assigned address
will still be programmed in the device's BAR registers.

In many cases, this still isn't a problem. Nothing uses the memory,
so nothing is affected. However, a big problem shows up when an IOMMU
is in use: the IOMMU will not reserve this space in the IOVA because the
kernel no longer thinks the range is valid. (See
dmar_init_reserved_ranges() for the Intel implementation of this.)

Without the proper reserved range, we have a situation where a DMA
mapping may occasionally allocate an IOVA which the PCI bus will actually
route to a BAR in the PLX switch. This will result in some random DMA
writes not actually writing to the RAM they are supposed to, or random
DMA reads returning all FFs from the PLX BAR when it's supposed to have
read from RAM.

The problem is caused in pci_assign_unassigned_root_bus_resources().
When any resource from a bridge device fails to get assigned, the code
sets the resource's flags to zero. This makes sense for bridge resources,
as they will be re-enabled later, but for regular BARs, it disables them
permanently. To fix the problem, we only set the flags to zero for
bridge resources and treat any other resources like non-bridge devices.

Reported-by: Kit Chow <kchow@xxxxxxxxxx>
Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Cc: Yinghai Lu <yinghai@xxxxxxxxxx>
drivers/pci/setup-bus.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 56b7077f37ff..3695edd9c256 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1821,11 +1821,16 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
/* restore size and flags */
list_for_each_entry(fail_res, &fail_head, list) {
struct resource *res = fail_res->res;
+ int idx;

res->start = fail_res->start;
res->end = fail_res->end;
res->flags = fail_res->flags;
- if (fail_res->dev->subordinate)
+ idx = res - &fail_res->dev->resource[0];
+ if (fail_res->dev->subordinate &&
res->flags = 0;