RE: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
From: Bernard Metzler
Date: Fri Sep 06 2024 - 09:03:21 EST
> -----Original Message-----
> From: Edward Srouji <edwards@xxxxxxxxxx>
> Sent: Thursday, September 5, 2024 2:23 PM
> To: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>;
> Jason Gunthorpe <jgg@xxxxxxxxxx>
> Cc: Leon Romanovsky <leonro@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> linux-rdma@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Saeed Mahameed
> <saeedm@xxxxxxxxxx>; Tariq Toukan <tariqt@xxxxxxxxxx>; Yishai Hadas
> <yishaih@xxxxxxxxxx>
> Subject: [EXTERNAL] Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct
> placement (DDP)
>
>
> On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > 在 2024/9/4 16:27, Edward Srouji 写道:
> >>
> >> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
> >>>> From: Leon Romanovsky <leonro@xxxxxxxxxx>
> >>>>
> >>>> Hi,
> >>>>
> >>>> This series from Edward introduces mlx5 data direct placement (DDP)
> >>>> feature.
> >>>>
> >>>> This feature allows WRs on the receiver side of the QP to be consumed
> >>>> out of order, permitting the sender side to transmit messages without
> >>>> guaranteeing arrival order on the receiver side.
> >>>>
> >>>> When enabled, the completion ordering of WRs remains in-order,
> >>>> regardless of the Receive WRs consumption order.
> >>>>
> >>>> RDMA Read and RDMA Atomic operations on the responder side continue to
> >>>> be executed in-order, while the ordering of data placement for RDMA
> >>>> Write and Send operations is not guaranteed.
> >>>
> >>> It is an interesting feature. If I got this feature correctly, this
> >>> feature permits the user consumes the data out of order when RDMA Write
> >>> and Send operations. But its completiong ordering is still in order.
> >>>
> >> Correct.
> >>> Any scenario that this feature can be applied and what benefits will be
> >>> got from this feature?
> >>>
> >>> I am just curious about this. Normally the users will consume the data
> >>> in order. In what scenario, the user will consume the data out of
> >>> order?
> >>>
> >> One of the main benefits of this feature is achieving higher bandwidth
> >> (BW) by allowing
> >> responders to receive packets out of order (OOO).
> >>
> >> For example, this can be utilized in devices that support multi-plane
> >> functionality,
> >> as introduced in the "Multi-plane support for mlx5" series [1]. When
> >> mlx5 multi-plane
> >> is supported, a single logical mlx5 port aggregates multiple physical
> >> plane ports.
> >> In this scenario, the requester can "spray" packets across the
> >> multiple physical
> >> plane ports without guaranteeing packet order, either on the wire or
> >> on the receiver
> >> (responder) side.
> >>
> >> With this approach, no barriers or fences are required to ensure
> >> in-order packet
> >> reception, which optimizes the data path for performance. This can
> >> result in better
> >> BW, theoretically achieving line-rate performance equivalent to the
> >> sum of
> >> the maximum BW of all physical plane ports, with only one QP.
> >
> > Thanks a lot for your quick reply. Without ensuring in-order packet
> > reception, this does optimize the data path for performance.
> >
> > I agree with you.
> >
> > But how does the receiver get the correct packets from the out-of-order
> > packets efficiently?
> >
> > The method is implemented in Software or Hardware?
>
>
> The packets have new field that is used by the HW to understand the
> correct message order (similar to PSN).
>
Interesting feature! Reminds me somehow on iWarp RDMA with its
DDP sub-layer 😉
But can that extra field be compliant with the standardized wire
protocol?
Thanks,
Bernard.
> Once the packets arrive OOO to the receiver side, the data is scattered
> directly (hence the DDP - "Direct Data Placement" name) by the HW.
>
> So the efficiency is achieved by the HW, as it also saves the required
> context and metadata so it can deliver the correct completion to the
> user (in-order) once we have some WQEs that can be considered an
> "in-order window" and be delivered to the user.
>
> The SW/Applications may receive OOO WR_IDs though (because the first CQE
> may have consumed Recv WQE of any index on the receiver side), and it's
> their responsibility to handle it from this point, if it's required.
>
> >
> > I am just interested in this feature and want to know more about this.
> >
> > Thanks,
> >
> > Zhu Yanjun
> >
> >>
> >> [1] INVALID URI REMOVED
> 3A__lore.kernel.org_lkml_cover.1718553901.git.leon-
> 40kernel.org_&d=DwIDaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tYSb
> qxyOwdSiLedP4yO55g&m=v7mstcYLoga4Ed_laSGpqjuQbnScgHCiflwmA4TzvXgi9x64qGYB4C
> ZGFrxQviQF&s=a-4dG1bvzL3dPsLsCSkubdHg_9eDKHIt-rEGQdaXvgU&e=
> >>> Thanks,
> >>> Zhu Yanjun
> >>>
> >>>>
> >>>> Thanks
> >>>>
> >>>> Edward Srouji (2):
> >>>> net/mlx5: Introduce data placement ordering bits
> >>>> RDMA/mlx5: Support OOO RX WQE consumption
> >>>>
> >>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
> >>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
> >>>> drivers/infiniband/hw/mlx5/qp.c | 51
> >>>> +++++++++++++++++++++++++---
> >>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
> >>>> include/uapi/rdma/mlx5-abi.h | 5 +++
> >>>> 5 files changed, 78 insertions(+), 11 deletions(-)
> >>>>
> >>>
> > --
> > Best Regards,
> > Yanjun.Zhu
> >