[PATCH 2/2] PCI: modify kernel parameters to differentiate between MMIO and MMIO_PREF sizes

From: Nicholas Johnson
Date: Wed Feb 06 2019 - 05:25:58 EST


Depends:
commit a1140e7bcb10ff96c192ee200e6cbf832f27158e
("PCI: fix serious bug when sizing bridges with additional size")

Background: I have come to find that the kernel parameters for reserving
resources for the hotplug bridges are not useful for Thunderbolt with
native PCI enumeration. You can only increase the size so far before it
fails to allocate the 32-bit MMIO windows.

Nature of problem: pci=hpmemsize=nnM is parsed from drivers/pci/pci.c
and used in drivers/pci/setup-bus.c. It overwrites pci_hotplug_mem_size
which has a default value set by DEFAULT_HOTPLUG_MEM_SIZE. When used,
this value is used to request additional space for MMIO and MMIO_PREF in
equal amounts.

The fallout: if you increase the value of pci=hpmemsize=nnM to be too
large to fit into the 32-bit address space, the MMIO window fails to
allocate and has zero-sized BARs. In the case of Thunderbolt, this means
the NHI does not have the resources needed to function, and the
thunderbolt.ko driver fails to probe the device. All Thunderbolt
functionality is lost. Even if the NHI worked, any Thunderbolt device
containing endpoints with 32-bit BARs (most of them) would fail to
initialise properly. The worst case happens if some resources end up
allocating with non-zero size, but too small. With some AMD Radeon
external GPUs, if certain combinations of BARs are assigned / failed to
assign, it can cause BUG_ON from arch/x86/mm/pat.c due to the buggy
amdgpu.ko driver trying to use a zero-sized BAR, instead of failing
gracefully. This more or less renders the system unusable.

The cause: because the kernel parameters do not allow the user to
specify the MMIO and MMIO_PREF hotplug bridge additional sizes
separately, they have no choice but to give unrealistic sizes that will
fail, or will be forced to make unpleasant compromises, resulting in
having to tolerate unstable or unpredictable operation.

The solution: My proposed patch depreciates
pci=hpmemsize=nnM,hpiosize=nn but allows them to work without any change
in user-visible functionality, with a KERN_WARNING message when either
are used. It adds the new parameters
pci=hp_io_size=nn,hp_mmio_size=nnM,hp_mmio_pref_size=nnM. If any of the
new kernel parameters are used, all of the newly-depreciated ones will
be overridden. This will allow for a grace period to allow people to
change to the new kernel parameters without unpleasant disruption. At a
time deemed appropriate by kernel developers, the old parameters can be
easily removed without the need to rework any code.

My testing: This works very well. I have not encountered any problems
and I am happily allocating 128M/64G under each Thunderbolt port with
nocrs. The success is dependent on my previous bug patch, which I
decided to separate from this feature patch. The old functionality still
works the same, and is overridden by the new commands if they are used.

Remarks: The printk at the end of pci_setup() in drivers/pci/pci.c has a
strange backquote ('`', under the tilde on the keyboard key) in the
format. I am not sure if it is a typo, or deliberate, and whether it
should change or if my KERN_WARNING logs need to use that symbol. Also,
the scripts/checkpatch.pl berates me for using printk, but the
alternative, pci_warn(), requires a pci_dev to work, and this message
does not pertain to a particular device.

Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@xxxxxxxxxxxxxx>
---
.../admin-guide/kernel-parameters.txt | 21 ++++++++--
drivers/pci/pci.c | 39 +++++++++++++++++--
drivers/pci/setup-bus.c | 26 +++++++------
include/linux/pci.h | 3 +-
4 files changed, 69 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b799bcf67..284752ff7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3327,12 +3327,27 @@
the default.
off: Turn ECRC off
on: Turn ECRC on.
- hpiosize=nn[KMG] The fixed amount of bus space which is
+
+ hpiosize=nn[KMG] Depreciated. Overridden by hp_io_size
+ value if any of hp_io_size, hp_mmio_size
+ or hp_mmio_pref_size are used. This sets
+ hp_io_size to the given value if not
+ overridden.
+
+ hpmemsize=nn[KMG] Depreciated. Overridden if any of
+ hp_io_size, hp_mmio_size or
+ hp_mmio_pref_size are used. This sets
+ hp_mmio_size and hp_mmio_pref_size to
+ the given value if not overridden.
+ hp_io_size=nn[KMG] The fixed amount of bus space which is
reserved for hotplug bridge's IO window.
Default size is 256 bytes.
- hpmemsize=nn[KMG] The fixed amount of bus space which is
- reserved for hotplug bridge's memory window.
+ hp_mmio_size=nn[KMG] The fixed amount of bus space which is
+ reserved for hotplug bridge's 32-bit mmio window.
Default size is 2 megabytes.
+ hp_mmio_pref_size=nn[KMG] The fixed amount of bus space
+ which is reserved for hotplug bridge's 64-bit
+ mmio_pref window. Default size is 2 megabytes.
hpbussize=nn The minimum amount of additional bus numbers
reserved for buses below a hotplug bridge.
Default is 1.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c25acace7..64978bf84 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -85,12 +85,15 @@ unsigned long pci_cardbus_io_size = DEFAULT_CARDBUS_IO_SIZE;
unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE;

#define DEFAULT_HOTPLUG_IO_SIZE (256)
-#define DEFAULT_HOTPLUG_MEM_SIZE (2*1024*1024)
-/* pci=hpmemsize=nnM,hpiosize=nn can override this */
+#define DEFAULT_HOTPLUG_MMIO_SIZE (2*1024*1024)
+#define DEFAULT_HOTPLUG_MMIO_PREF_SIZE (2*1024*1024)
+/* Override with pci=hp_io_size=nn,hp_mmio_size=nnM,hp_mmio_pref_size=nnM */
unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE;
-unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE;
+unsigned long pci_hotplug_mmio_size = DEFAULT_HOTPLUG_MMIO_SIZE;
+unsigned long pci_hotplug_mmio_pref_size = DEFAULT_HOTPLUG_MMIO_PREF_SIZE;

#define DEFAULT_HOTPLUG_BUS_SIZE 1
+/* Override with pci=hpbussize=nn,assign-busses */
unsigned long pci_hotplug_bus_size = DEFAULT_HOTPLUG_BUS_SIZE;

enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_DEFAULT;
@@ -6144,6 +6147,15 @@ EXPORT_SYMBOL(pci_fixup_cardbus);

static int __init pci_setup(char *str)
{
+ /*
+ * Depreciated pci=hpmemsize=nnM but keep the functionality for now.
+ * If none of hp_io_size, hp_mmio_size or hp_mmio_pref_size are set
+ * then override hp_mmio_size and hp_mmio_pref_size with hpmemsize.
+ */
+ unsigned long pci_hotplug_io_size_old = DEFAULT_HOTPLUG_IO_SIZE;
+ unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MMIO_SIZE;
+ bool use_new_pci_hotplug_params = 0;
+
while (str) {
char *k = strchr(str, ',');
if (k)
@@ -6176,9 +6188,23 @@ static int __init pci_setup(char *str)
} else if (!strncmp(str, "ecrc=", 5)) {
pcie_ecrc_get_policy(str + 5);
} else if (!strncmp(str, "hpiosize=", 9)) {
- pci_hotplug_io_size = memparse(str + 9, &str);
+ printk(KERN_WARNING "PCI: Depreciated option 'hpiosize', use 'hp_io_size' instead\n");
+ pci_hotplug_io_size_old =
+ memparse(str + 9, &str);
} else if (!strncmp(str, "hpmemsize=", 10)) {
+ printk(KERN_WARNING "PCI: Depreciated option 'hpmemsize', use 'hp_mmio_size' and 'hp_mmio_pref_size' instead\n");
pci_hotplug_mem_size = memparse(str + 10, &str);
+ } else if (!strncmp(str, "hp_io_size=", 11)) {
+ use_new_pci_hotplug_params = 1;
+ pci_hotplug_io_size = memparse(str + 11, &str);
+ } else if (!strncmp(str, "hp_mmio_size=", 13)) {
+ use_new_pci_hotplug_params = 1;
+ pci_hotplug_mmio_size =
+ memparse(str + 13, &str);
+ } else if (!strncmp(str, "hp_mmio_pref_size=", 18)) {
+ use_new_pci_hotplug_params = 1;
+ pci_hotplug_mmio_pref_size =
+ memparse(str + 18, &str);
} else if (!strncmp(str, "hpbussize=", 10)) {
pci_hotplug_bus_size =
simple_strtoul(str + 10, &str, 0);
@@ -6204,6 +6230,11 @@ static int __init pci_setup(char *str)
}
str = k;
}
+ if (!use_new_pci_hotplug_params) {
+ pci_hotplug_io_size = pci_hotplug_io_size_old;
+ pci_hotplug_mmio_size = pci_hotplug_mem_size;
+ pci_hotplug_mmio_pref_size = pci_hotplug_mem_size;
+ }
return 0;
}
early_param("pci", pci_setup);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c7e0a5e2b..e4cdc6484 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1234,7 +1234,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
{
struct pci_dev *dev;
unsigned long mask, prefmask, type2 = 0, type3 = 0;
- resource_size_t additional_mem_size = 0, additional_io_size = 0;
+ resource_size_t additional_io_size = 0, additional_mmio_size = 0,
+ additional_mmio_pref_size = 0;
struct resource *b_res;
int ret;

@@ -1267,8 +1268,9 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
case PCI_CLASS_BRIDGE_PCI:
pci_bridge_check_ranges(bus);
if (bus->self->is_hotplug_bridge) {
- additional_io_size = pci_hotplug_io_size;
- additional_mem_size = pci_hotplug_mem_size;
+ additional_io_size = pci_hotplug_io_size;
+ additional_mmio_size = pci_hotplug_mmio_size;
+ additional_mmio_pref_size = pci_hotplug_mmio_pref_size;
}
/* Fall through */
default:
@@ -1287,9 +1289,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
prefmask |= IORESOURCE_MEM_64;
ret = pbus_size_mem(bus, prefmask, prefmask,
prefmask, prefmask,
- realloc_head ? 0 : additional_mem_size,
- additional_mem_size, realloc_head);
-
+ realloc_head ? 0 : additional_mmio_pref_size,
+ additional_mmio_pref_size, realloc_head);
/*
* If successful, all non-prefetchable resources
* and any 32-bit prefetchable resources will go in
@@ -1310,9 +1311,9 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
if (!type2) {
prefmask &= ~IORESOURCE_MEM_64;
ret = pbus_size_mem(bus, prefmask, prefmask,
- prefmask, prefmask,
- realloc_head ? 0 : additional_mem_size,
- additional_mem_size, realloc_head);
+ prefmask, prefmask,
+ realloc_head ? 0 : additional_mmio_pref_size,
+ additional_mmio_pref_size, realloc_head);

/*
* If successful, only non-prefetchable resources
@@ -1321,7 +1322,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
if (ret == 0)
mask = prefmask;
else
- additional_mem_size += additional_mem_size;
+ additional_mmio_size +=
+ additional_mmio_pref_size;

type2 = type3 = IORESOURCE_MEM;
}
@@ -1342,8 +1344,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
* window.
*/
pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
- realloc_head ? 0 : additional_mem_size,
- additional_mem_size, realloc_head);
+ realloc_head ? 0 : additional_mmio_size,
+ additional_mmio_size, realloc_head);
break;
}
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 65f1d8c2f..b30d55697 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1959,7 +1959,8 @@ extern u8 pci_dfl_cache_line_size;
extern u8 pci_cache_line_size;

extern unsigned long pci_hotplug_io_size;
-extern unsigned long pci_hotplug_mem_size;
+extern unsigned long pci_hotplug_mmio_size;
+extern unsigned long pci_hotplug_mmio_pref_size;
extern unsigned long pci_hotplug_bus_size;

/* Architecture-specific versions may override these (weak) */
--
2.19.1