Re: [PATCH v5 00/11] PCI: Improve PCIe Capability RMW concurrency control

From: Bjorn Helgaas
Date: Thu Aug 10 2023 - 12:17:17 EST


On Mon, Jul 17, 2023 at 03:04:52PM +0300, Ilpo Järvinen wrote:
> PCI Express Capability RMW accessors don't properly protect against
> concurrent access. Link Control Register is written by a number of
> things in the kernel in a RMW fashion without any concurrency control.
> This could in the unlucky case lead to losing one of the updates. One
> of the most obvious path which can race with most of the other LNKCTL
> RMW operations seems to be ASPM policy sysfs write which triggers
> LNKCTL update. Similarly, Root Control Register can be concurrently
> accessed by AER and PME.
>
> Make pcie_capability_clear_and_set_word() (and other RMW accessors that
> call it) to use a per device spinlock to protect the RMW operations to
> the Capability Registers that require locking. Convert open-coded
> LNKCTL RMW operations to use pcie_capability_clear_and_set_word() to
> benefit from the locking.
>
> There's also a related series which improves ASPM service driver and
> device driver coordination by removing out-of-band ASPM state
> management from device drivers (which will remove some of the code
> fragments changed by this series but it has higher regression
> potential which is why it seems prudent to do these changes in two
> steps):
> https://lore.kernel.org/linux-pci/20230602114751.19671-1-ilpo.jarvinen@xxxxxxxxxxxxxxx/T/#t
>
> v5:
> - Remove reversed logic from a conditional
> - Use a variable for CCC setup
>
> v4:
> - Rebased on top of pci/main
> - Added patch to update documentation
>
> v3:
> - Split link retraining change off from ASPM patch & reorder it earlier
> - Adjust changelog to take into account the move of link retraining
> code into PCI core and no longer refer to ASPM (currently in
> pci/enumeration branch)
> - based on top of pci/main
>
> v2:
> - Keep the RMW ops caller API the same
> - Make pcie_capability_clear_and_set_word() a wrapper that uses
> locked/unlocked variant based on the capability reg
> - Extracted LNKCTL2 changes out from this series to keep this purely
> a series which fixes something (LNKCTL2 RMW lock is necessary only
> when PCIe BW control is introduced).
> - Added Fixes tags (it's a bit rathole but yeah, they're there now).
> - Renamed cap_lock to pcie_cap_lock
> - Changed ath1* to clear the ASPMC field before setting it
>
> Ilpo Järvinen (11):
> PCI: Add locking to RMW PCI Express Capability Register accessors
> PCI: Make link retraining use RMW accessors for changing LNKCTL
> PCI: pciehp: Use RMW accessors for changing LNKCTL
> PCI/ASPM: Use RMW accessors for changing LNKCTL
> drm/amdgpu: Use RMW accessors for changing LNKCTL
> drm/radeon: Use RMW accessors for changing LNKCTL
> net/mlx5: Use RMW accessors for changing LNKCTL
> wifi: ath11k: Use RMW accessors for changing LNKCTL
> wifi: ath12k: Use RMW accessors for changing LNKCTL
> wifi: ath10k: Use RMW accessors for changing LNKCTL
> PCI: Document the Capability accessor RMW improvements
>
> Documentation/PCI/pciebus-howto.rst | 14 ++++---
> drivers/gpu/drm/amd/amdgpu/cik.c | 36 +++++-------------
> drivers/gpu/drm/amd/amdgpu/si.c | 36 +++++-------------
> drivers/gpu/drm/radeon/cik.c | 36 +++++-------------
> drivers/gpu/drm/radeon/si.c | 37 +++++--------------
> .../ethernet/mellanox/mlx5/core/fw_reset.c | 9 +----
> drivers/net/wireless/ath/ath10k/pci.c | 9 +++--
> drivers/net/wireless/ath/ath11k/pci.c | 10 +++--
> drivers/net/wireless/ath/ath12k/pci.c | 10 +++--
> drivers/pci/access.c | 20 ++++++++--
> drivers/pci/hotplug/pciehp_hpc.c | 12 ++----
> drivers/pci/pci.c | 8 +---
> drivers/pci/pcie/aspm.c | 30 +++++++--------
> drivers/pci/probe.c | 1 +
> include/linux/pci.h | 34 ++++++++++++++++-
> 15 files changed, 136 insertions(+), 166 deletions(-)

Applied to pci/pcie-rmw for v6.6, thanks!

I removed the stable tags because we don't know of any actual problems
these fix.

Bjorn