Re: [RFC][PATCH] timers: Add del_time_free() to be called before freeing timers
From: Steven Rostedt
Date: Fri Apr 08 2022 - 16:10:37 EST
On Fri, 8 Apr 2022 07:33:53 -1000
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> That said, I would actually prefer a name-change: instead of making
> this about "del_timer_free()", can we please just make this about it
> being "final". Or maybe "del_timer_cancel()" or something like that?
>
> Because the actual _freeing_ will obviously happen later, and the
> function does nothing of the sort. In fact, there may be situations
I was originally thinking of calling it "del_timer_prepare_free()"
> where you don't free it at all, but just want to be in the situation
> where you want to make sure there are no pending timers until after
> you explicitly re-arm it, even if the timer would otherwise be
> self-arming.
We have that already, it's called "del_timer_sync()". And that's not used
when it should be, and does not catch the case of the timer being rearmed.
The idea of "del_timer_free()" or perhaps "del_timer_sync_terminate()?" is
that once called you will NEVER arm it again. And if you do, it's a bug.
>
> (That use-case would actually mean removing the WARN_ON_ONCE(), but I
> think that would be a "future use" issue, I'm *not* suggesting it not
> be done initially).
The point of this call is to be used when and only when the timer is about
to be freed. Not to just say "wait till it's done". Because we are hitting
a lot of bugs where the system crashes in the timer code while walking
through the link list of timers where one of the pending timers has been
freed, and we have no idea what timer that was.
Once this API is added, I would go around and add it to all locations that
del_timer(_sync) just before freeing it and call this instead. We already
found a few cases that there's a race after the del_timer_sync() that could
actually rearm the timer before it gets freed. This function would trigger
the WARN_ON(_ONCE).
>
> I also suspect that 99% of all del_timer_sync() users actually want
> that kind of explicit "del_timer_final()" behavior. Some may not
> _need_ it (because their re-arming already checks for "have I shut
> down?"), but I have this suspicion that we could convert a lot - maybe
> all - of the current del_timer_sync() users to this and try to see if
> we could just make it the rule.
We did find places that use del_timer_sync() that would later legitimately
rearm it. But I agree. Most would be del_timer_final() (or whatever).
>
> And then we could actually maybe start removing the explicit "shut
> down timer" code. See for example "work->canceling" logic for
> kthreads, which now does double duty (it disables re-arming the timer,
> _and_ it does some "double cancel work avoidance")
Yeah, that's the case we have with the hci_qca.c code that we have right
now. It has a timer where it can be rearmed in the work queue, and the
timer can wake the workqueue. This requires one of those dances to be able
to stop both and prevent one from enabling the other.
Hmm, I guess if we do remove the WARN_ON_ONCE(), and just not enable it
after del_timer_terminate/final() is called then it would remove this race.
You could terminate the timers and then destroy the work queue, and if
there's a pending work happening, you do not have to worry about it
rearming the timers, because they would be shut off, and the mod_timer()
would just return without issue.
Maybe add an API to allow both? One that will cause the warn on, as in (if
anything calls this again, warn about it), and the other that has it not
warn, but just be ignored.
We could differentiate between the two by making a stub function for the
ignore one.
On the warn case:
timer->fn = NULL;
for the non warn case:
timer->fn = timer_null();
and have:
if (WARN_ON_ONCE(!timer->fn) || unlikely(timer->fn == timer_null))
return;
del_timer_final(); // warn
del_timer_final_sync(); // no warn
or
del_timer_sync_final(); // warn
del_timer_sync_final_nowarn(); // no warn
??
[ Let the bike-shedding begin! ]
-- Steve
-- Steve