Re: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before freeing timers

From: Trond Myklebust
Date: Fri Oct 28 2022 - 16:12:39 EST


Hi Steve,

> On Oct 28, 2022, at 14:50, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> Trond,
>
> I'm looking at a commit from 2005:
>
> 0f9dc2b16884b ("RPC: Clean up socket autodisconnect")
>
> Cancel autodisconnect requests inside xprt_transmit() in order to avoid
> races.
> Use more efficient del_singleshot_timer_sync()
>
>
> I'm working on adding a "shutdown" state to timers, making it required for
> freeing the timer. This is to address the numerous bugs we hit where timers
> get rearmed just before freeing and then cause a crash in the timer code,
> without knowing what timer it was that caused it.
>
> Having a specific shutdown state for timers will remove this problem
> because if something tries to rearm a shutdown timer, it will fail and a
> WARN_ON_ONCE() is triggered. See below in the "reply" part for a
> description of this effort.
>
> The reason for this email, is because that WARN_ON_ONCE() triggered on the
> mod_timer() from:
>
> static void
> xprt_schedule_autodisconnect(struct rpc_xprt *xprt)
> __must_hold(&xprt->transport_lock)
> {
> xprt->last_used = jiffies;
> if (RB_EMPTY_ROOT(&xprt->recv_queue) && xprt_has_timer(xprt))
> mod_timer(&xprt->timer, xprt->last_used + xprt->idle_timeout);
> }
>
> That's because xptr->timer was shutdown due to:
>
> int
> xprt_request_enqueue_receive(struct rpc_task *task)
> {
> [..]
> /* Turn off autodisconnect */
> del_singleshot_timer_sync(&xprt->timer);
> return 0;
> }
>
> Now singleshot means just that. It's a single shot and calling
> del_singleshot_timer_sync() will shut it down so that it can be freed. That
> also means that it can no longer be re-armed.
>
> I'm not sure what you meant by "Use more efficient del_singleshot_timer_sync()"
> but I'm guessing since that was written in 2005, it is no longer relevant,
> and del_timer_sync() should now be used.
>
> After replacing that with del_timer_sync(), the warning goes away.
>
> I just want to confirm that's OK with you.

I seem to vaguely remember that at the time, del_timer_sync() would loop in order to catch re-arming timers, whereas del_singleshot_timer_sync() would not, hence the commit message. The expectation for del_singleshot_timer_sync() was simply that the caller would ensure safety against re-arming, which was indeed the case for this code.

However if that del_singleshot_timer_sync() expectation has been strengthened to mean that you guarantee never to re-arm the timer at all, then I agree that we should switch to del_timer_sync().

Thanks!
Trond

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx