[PATCH v7 8/8] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window
From: Nicholas Johnson
Date: Mon Jul 01 2019 - 10:26:22 EST
Background
==========================================================================
Currently, the kernel can sometimes assign the MMIO_PREF window
additional size into the MMIO window, resulting in double the MMIO
additional size, even if the MMIO_PREF window was successful.
This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
fails. In the next pass, because MMIO_PREF is already assigned, the
attempt to assign MMIO_PREF returns an error code instead of success
(nothing more to do, already allocated).
Example of problem (more context can be found in the bug report URL):
Mainline kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M
Patched kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M
This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
with the same configuration, with a Ubuntu mainline kernel and a kernel
patched with this patch series.
This patch is vital for the next patch in the series. The next patch
allows the user to specify MMIO and MMIO_PREF independently. If the
MMIO_PREF is set to be very large, this bug will end up more than
doubling the MMIO size. The bug results in the MMIO_PREF being added to
the MMIO window, which means doubling if MMIO_PREF size == MMIO size.
With a large MMIO_PREF, without this patch, the MMIO window will likely
fail to be assigned altogether due to lack of 32-bit address space.
Patch notes
==========================================================================
Change find_free_bus_resource() to not skip assigned resources with
non-null parent.
Add checks in pbus_size_io() and pbus_size_mem() to return success if
resource returned from find_free_bus_resource() is already allocated.
This avoids pbus_size_io() and pbus_size_mem() returning error code to
__pci_bus_size_bridges() when a resource has been successfully assigned
in a previous pass. This fixes the existing behaviour where space for a
resource could be reserved multiple times in different parent bridge
windows. This also greatly reduces the number of failed BAR messages in
dmesg when Linux assigns resources.
See related from Logan Gunthorpe (same problem, different solution):
https://lore.kernel.org/lkml/20190531171216.20532-2-logang@xxxxxxxxxxxx/T/#u
Solves bug report: https://bugzilla.kernel.org/show_bug.cgi?id=203243
Reported-by: Kit Chow <kchow@xxxxxxxxxx>
Reported-by: Nicholas Johnson <nicholas.johnson-opensource@xxxxxxxxxxxxxx>
Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@xxxxxxxxxxxxxx>
---
drivers/pci/setup-bus.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 873df5482..df4bf43b5 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -752,13 +752,18 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
}
/*
- * Helper function for sizing routines: find first available bus resource
- * of a given type. Note: we intentionally skip the bus resources which
- * have already been assigned (that is, have non-NULL parent resource).
+ * Helper function for sizing routines: find first bus resource of a given
+ * type. Note: we do not skip the bus resources which have already been
+ * assigned (r->parent != NULL). This is because a resource that is already
+ * assigned (nothing more to be done) will be indistinguishable from one that
+ * failed due to lack of space if we skip assigned resources. If the caller
+ * function cannot tell the difference then it might try to place the
+ * resources in a different window, doubling up on resources or causing
+ * unforeseeable issues.
*/
-static struct resource *find_free_bus_resource(struct pci_bus *bus,
- unsigned long type_mask,
- unsigned long type)
+static struct resource *find_bus_resource_of_type(struct pci_bus *bus,
+ unsigned long type_mask,
+ unsigned long type)
{
int i;
struct resource *r;
@@ -766,7 +771,7 @@ static struct resource *find_free_bus_resource(struct pci_bus *bus,
pci_bus_for_each_resource(bus, r, i) {
if (r == &ioport_resource || r == &iomem_resource)
continue;
- if (r && (r->flags & type_mask) == type && !r->parent)
+ if (r && (r->flags & type_mask) == type)
return r;
}
return NULL;
@@ -866,14 +871,16 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
struct list_head *realloc_head)
{
struct pci_dev *dev;
- struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
- IORESOURCE_IO);
+ struct resource *b_res = find_bus_resource_of_type(bus, IORESOURCE_IO,
+ IORESOURCE_IO);
resource_size_t size = 0, size0 = 0, size1 = 0;
resource_size_t children_add_size = 0;
resource_size_t min_align, align;
if (!b_res)
return;
+ if (b_res->parent)
+ return;
min_align = window_alignment(bus, IORESOURCE_IO);
list_for_each_entry(dev, &bus->devices, bus_list) {
@@ -978,7 +985,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
resource_size_t min_align, align, size, size0, size1;
resource_size_t aligns[18]; /* Alignments from 1MB to 128GB */
int order, max_order;
- struct resource *b_res = find_free_bus_resource(bus,
+ struct resource *b_res = find_bus_resource_of_type(bus,
mask | IORESOURCE_PREFETCH, type);
resource_size_t children_add_size = 0;
resource_size_t children_add_align = 0;
@@ -986,6 +993,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
if (!b_res)
return -ENOSPC;
+ if (b_res->parent)
+ return 0;
memset(aligns, 0, sizeof(aligns));
max_order = 0;
--
2.20.1