RE: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

From: Daisuke Matsuda (Fujitsu)
Date: Thu Dec 07 2023 - 01:37:28 EST


On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>
> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> >>
> >> Daisuke Matsuda (7):
> >> RDMA/rxe: Always defer tasks on responder and completer to workqueue
> >> RDMA/rxe: Make MR functions accessible from other rxe source code
> >> RDMA/rxe: Move resp_states definition to rxe_verbs.h
> >> RDMA/rxe: Add page invalidation support
> >> RDMA/rxe: Allow registering MRs for On-Demand Paging
> >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> >> RDMA/rxe: Add support for the traditional Atomic operations with ODP
> >
> > What is the current situation with rxe? I don't recall seeing the bugs
> > that were reported get fixed?

Well, I suppose Jason is mentioning "blktests srp/002 hang".
cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/

It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
so the hang looks not specific to rxe.
cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@xxxxxxx/T/
I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.


There is another issue that causes kernel panic.
[bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@xxxxxxxxxxxxxx/T/

https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
Zhijian has submitted patches to fix this, and he got some comments.
It looks he is involved in CXL driver intensively these days.
I guess he is still working on it.

>
> Exactly. A problem is reported in the link
> https://www.spinics.net/lists/linux-rdma/msg120947.html
>
> It seems that a variable 'entry' set but not used
> [-Wunused-but-set-variable]

Yeah, I can revise the patch anytime.

>
> And ODP is an important feature. Should we suggest to add a test case
> about this ODP in rdma-core to verify this ODP feature?

Rxe can share the same tests with mlx5.
I added test cases for Write, Read and Atomic operations with ODP,
and we can add more tests if there are any suggestions.
Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py

Thanks,
Daisuke Matsuda

>
> Zhu Yanjun
>
> >
> > I'm reluctant to dig a deeper hold until it is done?
> >
> > Thanks,
> > Jason