Re: [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect
From: Jason Gunthorpe
Date: Mon Jun 15 2026 - 20:00:44 EST
On Mon, Jun 15, 2026 at 05:46:19PM +0000, Alex Timofeyev wrote:
> RDMA-CM cannot establish an IPv6 RoCEv2 connection between two NICs that
> live on the same host. This shows up on hosts that pin one process per
> NUMA-local NIC and let those processes talk to each other over each NIC's
> global IPv6 GID (e.g. a storage daemon with one engine per NUMA node on
> dual ConnectX-7). rdma_resolve_addr() and ib_send_cm_req() both return
> success, but the destination NIC silently drops the frame and the peer
> never sees the REQ; the connection times out.
>
> The bug has two halves, one on each side of the connection:
>
> 1) Send side (patch 1, drivers/infiniband/core/addr.c)
>
> When the destination address is local, addr_resolve_neigh() copies the
> *source* device's MAC into the path record's destination MAC. That is
> right for true loopback (same netdev), but for a destination that lives
> on a different netdev of the same host the destination NIC will not
> accept a frame addressed to the source NIC's MAC and drops it in HW.
> The fix resolves the netdev that owns the destination address and uses
> its MAC.
I'm not sure about this, you need to have policy routing or VRF setup
so these local routes don't show up.. Do you have that?
A local route result should result only in a local loopback AH, it should
never result in a packet on the wire, and we shouldn't be trying to
mangle loopback routes at all.
> 2) Receive side (patch 2, drivers/infiniband/core/cma.c)
>
> Once the REQ does reach the peer, validate_ipv6_net_dev() rejects it:
> rt6_lookup() of a same-host destination collapses onto the loopback
> netdev, so the strict rt6i_idev->dev == net_dev check fails with
> -EHOSTUNREACH even though the REQ arrived on the right net_dev. The fix
> accepts an RTF_LOCAL route when net_dev itself owns the listener
> address. This half is only observable once patch 1 lets the REQ
> arrive.
Same answer here, if you have proper routing you won't get a loopback
route to match and you won't fail on this check. Removing the check
does not seem correct.
Jason