Re: [PATCH for-next] RDMA/cm: Rate limit destroy CM ID timeout error message
From: Jason Gunthorpe
Date: Wed Oct 15 2025 - 14:45:19 EST
On Wed, Oct 15, 2025 at 06:34:33PM +0000, Sean Hefty wrote:
> > > With this hack, running cmtime with 10.000 connections in loopback,
> > > the "cm_destroy_id_wait_timeout: cm_id=000000007ce44ace timed out.
> > > state 6 -> 0, refcnt=1" messages are indeed produced. Had to kill
> > > cmtime because it was hanging, and then it got defunct with the
> > > following stack:
> >
> > Seems like a bug, it should not hang forever if a MAD is lost..
>
> The hack skipped calling ib_post_send. But the result of that is a
> completion is never written to the CQ. The state machine or
> reference counting is likely waiting for the completion, so it knows
> that HW is done trying to access the buffer.
That does make sense, it has to immediately trigger the completion to
be accurate. A better test would be to truncate the mad or something
so it can't be rx'd
Jason