Re: [RESEND] Handle MPS mismatch for Switch Downstream Ports

From: Bjorn Helgaas

Date: Wed Apr 01 2026 - 17:37:53 EST


Thanks for the report, and sorry we missed your original email; I
couldn't find it in the lore archives, so maybe it got lost in
transit.

On Tue, Mar 31, 2026 at 04:10:56AM +0000, Devilliv Kelly wrote:
> Background
> ===========
> Commit 9f0e89359775 ("PCI: Match Root Port's MPS to endpoint's MPSS as
> necessary") added logic to reduce a Root Port's MPS when an endpoint's
> MPSS is smaller than the Root Port's current MPS setting. This ensures
> hot-added devices can work correctly.
>
> However, this logic only applies to ROOT_PORT type bridges:
>
> mpss = 128 << dev->pcie_mpss;
> if (mpss < p_mps && pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT) {
> pcie_set_mps(bridge, mpss);
> ...
> }
>
> This leaves Switch Downstream Ports unhandled, which can cause issues
> when the Switch reports an incorrect or unexpected MPS value after
> secondary bus reset.
>
> Problem Description
> ===================
> We encountered a scenario where a PCIe Switch Downstream Port reports
> an MPS value larger than what the endpoint can support:
>
> Topology:
> 16:00.0 - Switch Upstream Port (MPS = 512 bytes, correct)
> └── 17:00.0 - Switch Downstream Port (MPS = 2048 bytes after secondary bus reset)
> └── 18:00.0 - Endpoint device (DevCap MaxPayload = 512 bytes)
>
> After a secondary bus reset, the Switch Downstream Port's MPS unexpectedly
> became 2048 bytes. When the kernel enumerates the endpoint device (18:00.0),
> it attempts to set the endpoint's MPS to 2048 to match the upstream bridge,
> but this fails because the endpoint only supports a maximum of 512 bytes.
>
> Kernel log shows:
> pci 0000:18:00.0: can't set Max Payload Size to 2048; if necessary,
> use "pci=pcie_bus_safe" and report a bug

Can you please collect the complete dmesg log when booted with
the 'dyndbg="file drivers/pci/* +p"' kernel parameter? (The double
quotes are a necessary part of the parameter)

How do you initiate the reset and which device is being reset? What
caused the subsequent enumeration?

My guess is you used setpci to set the Secondary Bus Reset bit in the
16:00.0 Bridge Control register? And maybe you used a sysfs "rescan"
file to enumerate the endpoint?

If you used a sysfs reset interface or a driver called
pci_reset_function(), the kernel should have saved and restored config
space so the 17:00.0 MPS shouldn't change unexpectedly. Also, the
kernel would only let you set SBR in 16:00.0 if there was a single
device on bus 17, and switches typically have multiple downstream
ports.

> This results in NMI errors when the endpoint attempts DMA transactions:
> Uhhuh. NMI received for unknown reason 2c on CPU 0.
> Dazed and confused, but trying to continue
>
> Root Cause
> ==========
> The pci_configure_mps() function only adjusts the upstream bridge's MPS
> when the bridge is a ROOT_PORT. For DOWNSTREAM_PORT types (Switch ports),
> the kernel attempts to set the endpoint's MPS to the bridge's value
> without checking if the endpoint can support it.
>
> While the Switch firmware should ideally configure correct MPS values,
> the kernel should be robust enough to handle such cases and ensure
> proper MPS configuration for reliable operation.

Agreed. I think the SBR *should* reset MPS to the default value of
000b (128 bytes), but maybe this switch doesn't work that way.
Regardless, I agree that Linux should handle this better.

> Current Behavior
> ================
> 1. Endpoint's MPSS < Bridge's MPS
> 2. Bridge is DOWNSTREAM_PORT (not ROOT_PORT)
> 3. Kernel skips bridge MPS adjustment
> 4. pcie_set_mps(dev, p_mps) fails because p_mps > dev's capability
> 5. Device may not function correctly
>
> Workaround
> ==========
> The issue can be worked around by using the kernel parameter:
> pci=pcie_bus_safe
>
> However, this affects the entire system and may reduce performance
> for other devices.
>
> Questions for Discussion
> ========================
> 1. Was there a specific reason for restricting this logic to ROOT_PORT
> only? The commit message mentions avoiding impact on "other unrelated
> sub-topologies," but Switch Downstream Ports typically only have one
> endpoint below them.
>
> 2. Should we also consider propagating MPS changes up through multiple
> Switch levels in the hierarchy?
>
> References
> ==========
> - Commit 9f0e89359775: PCI: Match Root Port's MPS to endpoint's MPSS as necessary
> - Commit 27d868b5e6cf: PCI: Set MPS to match upstream bridge
> - https://bugzilla.kernel.org/show_bug.cgi?id=200527 (original ROOT_PORT fix)
>
> Kelly Devilliv