Re: [Patch v4 03/12] net: mana: Handle vport sharing between devices

From: Jason Gunthorpe
Date: Fri Jul 29 2022 - 15:12:48 EST


On Fri, Jul 29, 2022 at 06:44:22PM +0000, Long Li wrote:
> > Subject: Re: [Patch v4 03/12] net: mana: Handle vport sharing between devices
> >
> > On Thu, Jul 21, 2022 at 05:58:39PM +0000, Long Li wrote:
> > > > > "vport" is a hardware resource that can either be used by an
> > > > > Ethernet device, or an RDMA device. But it can't be used by both
> > > > > at the same time. The "vport" is associated with a protection
> > > > > domain and doorbell, it's programmed in the hardware. Outgoing
> > > > > traffic is enforced on this vport based on how it is programmed.
> > > >
> > > > Sure, but how is the users problem to "get this configured right"
> > > > and what exactly is the user supposed to do?
> > > >
> > > > I would expect the allocation of HW resources to be completely
> > > > transparent to the user. Why is it not?
> > > >
> > >
> > > In the hardware, RDMA RAW_QP shares the same hardware resource (in
> > > this case, the vPort in hardware table) with the ethernet NIC. When an
> > > RDMA user creates a RAW_QP, we can't just shut down the ethernet. The
> > > user is required to make sure the ethernet is not in used when he
> > > creates this QP type.
> >
> > You haven't answered my question - how is the user supposed to achieve this?
>
> The user needs to configure the network interface so the kernel will not use it when the user creates a RAW QP on this port.
>
> This can be done via system configuration to not bring this
> interface online on system boot, or equivalently doing "ifconfig xxx
> down" to make the interface down when creating a RAW QP on this
> port.

That sounds horrible, why allow the user to even bind two drivers if
the two drivers can't be used together?

> > And now I also want to know why the ethernet device and rdma device can even
> > be loaded together if they cannot share the physical port?
> > Exclusivity is not a sharing model that any driver today implements.
>
> This physical port limitation only applies to the RAW QP. For RC QP,
> the hardware doesn't have this limitation. The user can create RC
> QPs on a physical port up to the hardware limits independent of the
> Ethernet usage on the same port.

.. and it is because you support sharing models in other cases :\

> Scenario 1: The Ethernet loses TCP connection.

> 1. User A runs a program listing on a TCP port, accepts an incoming
> TCP connection and is communicating with the remote peer over this
> TCP connection.
> 2. User B creates an RDMA RAW_QP on the same port on the device.
> 3. As soon as the RAW_QP is created, the program in 1 can't
> send/receive data over this TCP connection. After some period of
> inactivity, the TCP connection terminates.

It is a little more complicated than that, but yes, that could
possibly happen if the userspace captures the right traffic.

> Please note that this may also pose a security risk. User B with
> RAW_QP can potentially hijack this TCP connection from the kernel by
> framing the correct Ethernet packets and send over this QP to trick
> the remote peer, making it believe it's User A.

Any root user can do this with the netstack using eg tcpdump, bpf,
XDP, raw sockets, etc. This is why the capability is guarded by
CAP_NET_RAW. It is nothing unusual.

> Scenario 2: The Ethernet port state changes after RDMA RAW_QP is used on the port.
> 1. User uses "ifconfig ethx down" on the NIC, intending to make it offline
> 2. User creates a RDMA RAW_QP on the same port on the device.
> 3. User destroys this RAW_QP.
> 4. The ethx device in 1 reports carrier state in step 2, in many
> Linux distributions this makes it online without user
> interaction. "ifconfig ethx" shows its state changes to "up".

This I'm not familiar with, it actually sounds like a bug that the
RAW_QP's interfere with the netdev carrier state.

> the Mellanox NICs implement the RAW_QP. IMHO, it's better to have
> the user explicitly decide whether to use Ethernet or RDMA RAW_QP on
> a specific port.

It should all be carefully documented someplace.

Jason