Re: [PATCH 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region

From: Bjorn Helgaas
Date: Sun Mar 03 2019 - 19:23:58 EST


Hi Logan,

Sorry for the delay. This code gives a headache. I still remember
my headache from the last time we touched it. Help me understand
what's going on here.

On Thu, Feb 14, 2019 at 10:00:27AM -0700, Logan Gunthorpe wrote:
> When using the pci=realloc command line argument, with hpmemsize not
> equal to zero, some hierarchies of 32-bit resources can fail to be
> assigned in some situations. When this happens, the user will see
> some PCI BAR resources being ignored and some PCI Bridge windows
> being left unset. In lspci this may look like:
>
> Memory behind bridge: fff00000-000fffff
>
> or
>
> Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]
>
> Ignored BARs mean the underlying device will not be usable.
>
> The possible situations where this can happen will be quite varied and
> depend highly on the exact hierarchy and how the realloc code ends up
> trying to assign the regions. It's known to at least require a
> large 64-bit BAR (>1GB) below a PCI bridge.

I guess the bug is that some BAR or window is unset when we actually
have space for it? We need to make this more concrete, e.g., with a
minimal example of a failure case, and then connect this code change
specifically with that.

"Ignored BARs" doesn't seem like the best terminology here. Can we
just say they're "unset" as you do for windows? Even that's a little
squishy because there's really no such thing as a clearly "unset" or
invalid value for a BAR. All we can say is that Linux *thinks* it's
unset because it happens to be zero (technically still a valid BAR
value) or it conflicts with another device.

Strictly speaking, the result is that we can't enable decoding for
that BAR type. Often that does mean the device is unusable, but in
some cases, e.g., an I/O BAR being unset and a driver using
pci_enable_device_mem(), the device *is* usable.

Surely realloc can fail even without a large 64-bit BAR? I don't
think there's a magic threshold at 1GB. Maybe an example would
illustrate the problem better.

> The cause of this bug is in __pci_bus_size_bridges() which tries to
> calculate the total resource space required for each of the bridge windows
> (typically IO, 64-bit, and 32-bit / non-prefetchable). The code, as
> written, tries to allocate all the 64-bit prefetchable resources
> followed by all the remaining resources. It uses two calls to
> pbus_size_mem() for this. If the first call to pbus_size_mem() fails
> it tries to fit all resources into the 32-bit bridge window and it
> expects the size of the 32-bit bridge window to be multiple GBs which
> will never be assignable under the 4GB limit imposed on it.

There are actually three calls to pbus_size_mem():

1) If bridge has a 64-bit prefetchable window, find the size of all
64-bit prefetchable resources below the bridge

2) If bridge has no 64-bit prefetchable window, find the size of all
prefetchable resources below the bridge

3) Find the size of everything else (non-prefetchable resources plus
any prefetchable ones that couldn't be accommodated above)

Sorry again for being so literal and unimaginative, but I don't
understand how the code "expects the size of the ... window to be
multiple GBs which will never be assignable ...". Whether things are
assignable just depends on what resources are available. It's not a
matter of "expecting" the window to be big enough; it just is big
enough or it isn't.

> There are only two reasons for pbus_size_mem() to fail: if there is no
> 64-bit/prefetchable bridge window, or if that window is already
> assigned (in other words, its resource already has a parent set). We know
> the former case can't be true because, in __pci_bus_size_bridges(), it's
> existence is checked before making the call. So if the pbus_size_mem()
> call in question fails, the window must already be assigned, and in this
> case, we still do not want 64-bit resources trying to be sized into the
> 32-bit catch-all resource.

I guess this question of putting a 64-bit resource in the 32-bit
non-prefetchable window (legal but undesirable) is a secondary thing,
not the chief complaint you're fixing?

> So to fix the bug, we must always set mask, type2 and type3 in cases
> where a 64-bit resource exists even if pbus_size_mem() fails.
>
> Reported-by: Kit Chow <kchow@xxxxxxxxxx>
> Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Cc: Yinghai Lu <yinghai@xxxxxxxxxx>
> ---
> drivers/pci/setup-bus.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index ed960436df5e..56b7077f37ff 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1265,21 +1265,20 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
> prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
> if (b_res[2].flags & IORESOURCE_MEM_64) {
> prefmask |= IORESOURCE_MEM_64;
> - ret = pbus_size_mem(bus, prefmask, prefmask,
> + pbus_size_mem(bus, prefmask, prefmask,
> prefmask, prefmask,
> realloc_head ? 0 : additional_mem_size,
> additional_mem_size, realloc_head);
>
> /*
> - * If successful, all non-prefetchable resources
> - * and any 32-bit prefetchable resources will go in
> - * the non-prefetchable window.
> + * Given the existence of a 64-bit resource for this
> + * bus, all non-prefetchable resources and any 32-bit
> + * prefetchable resources will go in the
> + * non-prefetchable window.
> */
> - if (ret == 0) {
> - mask = prefmask;
> - type2 = prefmask & ~IORESOURCE_MEM_64;
> - type3 = prefmask & ~IORESOURCE_PREFETCH;
> - }
> + mask = prefmask;
> + type2 = prefmask & ~IORESOURCE_MEM_64;
> + type3 = prefmask & ~IORESOURCE_PREFETCH;
> }
>
> /*
> --
> 2.19.0
>