Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)

From: Edward Srouji
Date: Fri Sep 06 2024 - 08:18:15 EST



On 9/6/2024 8:02 AM, Zhu Yanjun wrote:
External email: Use caution opening links or attachments


在 2024/9/5 20:23, Edward Srouji 写道:

On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
External email: Use caution opening links or attachments


在 2024/9/4 16:27, Edward Srouji 写道:

On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
External email: Use caution opening links or attachments


在 2024/9/3 19:37, Leon Romanovsky 写道:
From: Leon Romanovsky <leonro@xxxxxxxxxx>

Hi,

This series from Edward introduces mlx5 data direct placement (DDP)
feature.

This feature allows WRs on the receiver side of the QP to be consumed
out of order, permitting the sender side to transmit messages without
guaranteeing arrival order on the receiver side.

When enabled, the completion ordering of WRs remains in-order,
regardless of the Receive WRs consumption order.

RDMA Read and RDMA Atomic operations on the responder side continue to
be executed in-order, while the ordering of data placement for RDMA
Write and Send operations is not guaranteed.

It is an interesting feature. If I got this feature correctly, this
feature permits the user consumes the data out of order when RDMA Write
and Send operations. But its completiong ordering is still in order.

Correct.
Any scenario that this feature can be applied and what benefits will be
got from this feature?

I am just curious about this. Normally the users will consume the data
in order. In what scenario, the user will consume the data out of
order?

One of the main benefits of this feature is achieving higher bandwidth
(BW) by allowing
responders to receive packets out of order (OOO).

For example, this can be utilized in devices that support multi-plane
functionality,
as introduced in the "Multi-plane support for mlx5" series [1]. When
mlx5 multi-plane
is supported, a single logical mlx5 port aggregates multiple physical
plane ports.
In this scenario, the requester can "spray" packets across the
multiple physical
plane ports without guaranteeing packet order, either on the wire or
on the receiver
(responder) side.

With this approach, no barriers or fences are required to ensure
in-order packet
reception, which optimizes the data path for performance. This can
result in better
BW, theoretically achieving line-rate performance equivalent to the
sum of
the maximum BW of all physical plane ports, with only one QP.

Thanks a lot for your quick reply. Without ensuring in-order packet
reception, this does optimize the data path for performance.

I agree with you.

But how does the receiver get the correct packets from the out-of-order
packets efficiently?

The method is implemented in Software or Hardware?


The packets have new field that is used by the HW to understand the
correct message order (similar to PSN).

Once the packets arrive OOO to the receiver side, the data is scattered
directly (hence the DDP - "Direct Data Placement" name) by the HW.

So the efficiency is achieved by the HW, as it also saves the required
context and metadata so it can deliver the correct completion to the
user (in-order) once we have some WQEs that can be considered an
"in-order window" and be delivered to the user.

The SW/Applications may receive OOO WR_IDs though (because the first CQE
may have consumed Recv WQE of any index on the receiver side), and it's
their responsibility to handle it from this point, if it's required.

Got it. It seems that all the functionalities are implemented in HW. The
SW only receives OOO WR_IDs. Thanks a lot. Perhaps it is helpful to RDMA
LAG devices. It should enhance the performance^_^

BTW, do you have any performance data with this feature?

Not yet. We tested it functionality wise for now.

But we should be able to measure its performance soon :).



Best Regards,
Zhu Yanjun



I am just interested in this feature and want to know more about this.

Thanks,

Zhu Yanjun


[1] https://lore.kernel.org/lkml/cover.1718553901.git.leon@xxxxxxxxxx/
Thanks,
Zhu Yanjun


Thanks

Edward Srouji (2):
   net/mlx5: Introduce data placement ordering bits
   RDMA/mlx5: Support OOO RX WQE consumption

  drivers/infiniband/hw/mlx5/main.c    |  8 +++++
  drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
  drivers/infiniband/hw/mlx5/qp.c      | 51
+++++++++++++++++++++++++---
  include/linux/mlx5/mlx5_ifc.h        | 24 +++++++++----
  include/uapi/rdma/mlx5-abi.h         |  5 +++
  5 files changed, 78 insertions(+), 11 deletions(-)


--
Best Regards,
Yanjun.Zhu