[PATCH v5 3/5] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window

From: Nicholas Johnson
Date: Sun May 05 2019 - 10:41:45 EST


Background
==========================================================================

Solve bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=203243

Currently, the kernel can sometimes assign the MMIO_PREF window
additional size into the MMIO window, resulting in double the MMIO
additional size, even if the MMIO_PREF window was successful.

This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
fails. In the next pass, because MMIO_PREF is already assigned, the
attempt to assign MMIO_PREF returns an error code instead of success
(nothing more to do, already allocated).

Example of problem (more context can be found in the bug report URL):

Mainline kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M

Patched kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M

This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
with the same configuration, with a Ubuntu mainline kernel and a kernel
patched with this patch series.

This patch is vital for the next patch in the series. The next patch
allows the user to specify MMIO and MMIO_PREF independently. If the
MMIO_PREF is set to be very large, this bug will end up more than
doubling the MMIO size. The bug results in the MMIO_PREF being added to
the MMIO window, which means doubling if MMIO_PREF size == MMIO size.
With a large MMIO_PREF, without this patch, the MMIO window will likely
fail to be assigned altogether due to lack of 32-bit address space.

Patch notes
==========================================================================

Change find_free_bus_resource() to not skip assigned resources with
non-null parent.

Add checks in pbus_size_io() and pbus_size_mem() to return success if
resource returned from find_free_bus_resource() is already allocated.

This avoids pbus_size_io() and pbus_size_mem() returning error code to
__pci_bus_size_bridges() when a resource has been successfully assigned
in a previous pass. This fixes the existing behaviour where space for a
resource could be reserved multiple times in different parent bridge
windows. This also greatly reduces the number of failed BAR messages in
dmesg when Linux assigns resources.

Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@xxxxxxxxxxxxxx>
---
drivers/pci/setup-bus.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 5214815c7..e7126cc0e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -752,11 +752,17 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
}
}

-/* Helper function for sizing routines: find first available
- bus resource of a given type. Note: we intentionally skip
- the bus resources which have already been assigned (that is,
- have non-NULL parent resource). */
-static struct resource *find_free_bus_resource(struct pci_bus *bus,
+/*
+ * Helper function for sizing routines: find first bus resource of a given
+ * type. Note: we do not skip the bus resources which have already been
+ * assigned (r->parent != NULL). This is because a resource that is already
+ * assigned (nothing more to be done) will be indistinguishable from one that
+ * failed due to lack of space if we skip assigned resources. If the caller
+ * function cannot tell the difference then it might try to place the
+ * resources in a different window, doubling up on resources or causing
+ * unforeseeable issues.
+ */
+static struct resource *find_bus_resource_of_type(struct pci_bus *bus,
unsigned long type_mask, unsigned long type)
{
int i;
@@ -765,7 +771,7 @@ static struct resource *find_free_bus_resource(struct pci_bus *bus,
pci_bus_for_each_resource(bus, r, i) {
if (r == &ioport_resource || r == &iomem_resource)
continue;
- if (r && (r->flags & type_mask) == type && !r->parent)
+ if (r && (r->flags & type_mask) == type)
return r;
}
return NULL;
@@ -863,14 +869,16 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
resource_size_t add_size, struct list_head *realloc_head)
{
struct pci_dev *dev;
- struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
- IORESOURCE_IO);
+ struct resource *b_res = find_bus_resource_of_type(bus, IORESOURCE_IO,
+ IORESOURCE_IO);
resource_size_t size = 0, size0 = 0, size1 = 0;
resource_size_t children_add_size = 0;
resource_size_t min_align, align;

if (!b_res)
return;
+ if (b_res->parent)
+ return;

min_align = window_alignment(bus, IORESOURCE_IO);
list_for_each_entry(dev, &bus->devices, bus_list) {
@@ -975,7 +983,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
resource_size_t min_align, align, size, size0, size1;
resource_size_t aligns[18]; /* Alignments from 1Mb to 128Gb */
int order, max_order;
- struct resource *b_res = find_free_bus_resource(bus,
+ struct resource *b_res = find_bus_resource_of_type(bus,
mask | IORESOURCE_PREFETCH, type);
resource_size_t children_add_size = 0;
resource_size_t children_add_align = 0;
@@ -983,6 +991,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,

if (!b_res)
return -ENOSPC;
+ if (b_res->parent)
+ return 0;

memset(aligns, 0, sizeof(aligns));
max_order = 0;
--
2.19.1