Re: [PATCH v7 1/3] PCI: AtomicOps: Do not enable requests by RCiEPs

From: Kuehling, Felix

Date: Tue Mar 31 2026 - 16:12:59 EST



On 2026-03-31 15:01, Bjorn Helgaas wrote:
On Tue, Mar 31, 2026 at 02:39:26PM -0400, Kuehling, Felix wrote:
On 2026-03-31 14:09, Bjorn Helgaas wrote:
On Mon, Mar 30, 2026 at 08:01:57PM -0400, Kuehling, Felix wrote:
On 2026-03-30 17:42, Bjorn Helgaas wrote:
[+to amdgpu, bnxe_re, mlx5 IB, qedr, mlx5 maintainers]

On Mon, Mar 30, 2026 at 03:09:44PM +0200, Gerd Bayer wrote:
Since root complex integrated end points (RCiEPs) attach to a bus that
has no bridge device describing the root port, the capability to
complete AtomicOps requests cannot be determined with PCIe methods.

Change default of pci_enable_atomic_ops_to_root() to not enable
AtomicOps requests on RCiEPs.
I know I suggested this because there's nothing explicit that tells us
whether the RC supports atomic ops from RCiEPs [1]. But I'm concerned
that GPUs, infiniband HCAs, and NICs that use atomic ops may be
implemented as RCiEPs and would be broken by this.
FWIW, on AMD APUs our driver doesn't call pci_enable_atomic_ops_to_root. It
just assumes that the GPU can do atomic accesses because it doesn't actually
go through PCIe: https://elixir.bootlin.com/linux/v6.19.10/source/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L4785
What does this mean for the other branch that *does* use
pci_enable_atomic_ops_to_root()? Can any of those devices be RCiEPs?
Most AMD GPUs are not integrated endpoints. APUs are integrated. There are
A+A GPUs where the GPUs are separate from the CPU but part of the same
coherent data fabric as the CPU (adev->gmc.xbmi.connected_to_cpu == true).
Those may also be considered RCiEPs. (I'm not sure about that, is there an
easy way to check with lspci?) We may need to include that in the same
branch as APUs.
Yep, for RCiEPs, "lspci -v" should say something like this:

Capabilities: [64] Express Root Complex Integrated Endpoint

Dmesg logs from recent kernels would also include it like this:

pci 0000:00:02.0: [8086:5916] type 00 class 0x030000 PCIe Root Complex Integrated Endpoint

An RCiEP would be on the root bus; it would not be below a Root Port.

I'm getting this from lspci:
    Capabilities: [64] Express Endpoint, MSI 00

Regards,
  Felix



You can see that we did that for a new generation of A+A GPU here: https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?ref_type=heads#L3920.
We'd need to confirm that the same works for MI200 A+A GPUs as well.
These drivers use pci_enable_atomic_ops_to_root():

amdgpu
bnxt_re (infiniband)
mlx5 (infinband)
qedr (infiniband)
mlx5 (ethernet)

Maybe we should assume that because RCiEPs are directly integrated
into the RC, the RCiEP would only allow AtomicOp Requester Enable to
be set if the RC supports atomic ops?

I don't like making assumptions like that, but it'd be worse to break
these devices.

[1] https://lore.kernel.org/all/20260326164002.GA1325368@bhelgaas

Signed-off-by: Gerd Bayer <gbayer@xxxxxxxxxxxxx>
---
drivers/pci/pci.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8479c2e1f74f1044416281aba11bf071ea89488a..135e5b591df405e87e7f520a618d7e2ccba55ce1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3692,15 +3692,14 @@ int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)
/*
* Per PCIe r4.0, sec 6.15, endpoints and root ports may be
- * AtomicOp requesters. For now, we only support endpoints as
- * requesters and root ports as completers. No endpoints as
+ * AtomicOp requesters. For now, we only support (legacy) endpoints
+ * as requesters and root ports as completers. No endpoints as
* completers, and no peer-to-peer.
*/
switch (pci_pcie_type(dev)) {
case PCI_EXP_TYPE_ENDPOINT:
case PCI_EXP_TYPE_LEG_END:
- case PCI_EXP_TYPE_RC_END:
break;
default:
return -EINVAL;

--
2.51.0