Re: [PATCH for-next] RDMA/hns: Support mmapping reset state to userspace
From: Junxian Huang
Date: Tue Dec 17 2024 - 01:09:34 EST
On 2024/12/13 20:49, Jason Gunthorpe wrote:
> On Fri, Dec 13, 2024 at 05:37:58PM +0800, Junxian Huang wrote:
>>> But your reset flow partially disassociates the device, when the
>>> userspace goes back to sleep, or rearms the CQ, it should get a hard
>>> fail and do a full cleanup without relying on flushing.
>>
>> Not sure if I got your point, when you said "the userspace goes back to sleep",
>> did you mean the ibv_get_async_event() api? Are you suggesting that userspace
>> should call ibv_get_async_event() to monitor async events, and when it gets a
>> fatal event, it should stop polling CQs and clean up everything instead of
>> still waiting for the remaining CQEs?
>
> Yes, it should do that as well. This is wha the devce fatal event is
> for.
>
> I'm also saying that any kernel systems calls, like sleeping for CQ
> events should start failing too.
>
> Jason
Thanks. I took a cursory look at some open-source userspace projects,
UCX and SPDK handle the device fatal event properly by doing cleanup.
But Ceph doesn't seem to have any special handling except for logs..
Junxian