Re: kernel timer races

From: Andrey Savochkin (saw@saw.sw.com.sg)
Date: Sat May 27 2000 - 00:16:32 EST

Next message: dart@iowaone.net: "epic100 && >=2.3.99pre7 && ping -f == reproducible lockup"
Previous message: Andre Hedrick: "Re: Adaptec AAA-131U2 RAID"
In reply to: Andrew Morton: "Re: kernel timer races"
Next in thread: Andrew Morton: "Re: kernel timer races"
Reply: Andrew Morton: "Re: kernel timer races"
Reply: kuznet@ms2.inr.ac.ru: "Re: kernel timer races"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Andrew,

On Sat, May 27, 2000 at 02:37:04PM +1000, Andrew Morton wrote:
> Andrey Savochkin wrote:
> > It's more than del_timer_sync() because it ensures that the code segment
> > isn't used by timer handler, and, thus, is suitable for module cleanups.
>
> Could you please explain this a little more? I don't see it.

I meant that the 2.3.99 del_timer_sync() waits for timer_exit() call
in the handler. The proposed code of wait_on_timers() waits for the end of
run_timer_list() and, thus, guarantees that the handler has _returned_, not
just called timer_exit().

It means that
        set_do_not_readd_flag
        del_timer
        wait_on_timers()
        MOD_DEC_USE_COUNT
is safe, while
        del_timer_sync() /* 2.3.99 implementation */
        MOD_DEC_USE_COUNT
isn't.

> Were you referring to the existing (2.3.99) del_timer_sync()?

Yes.

> > I've looked more precisely over your timer-race-6 patch.
> > It has not so big overhead as I expected for the first time.
> > However, I think it has a potential problem. In del_timer() you drop the
> > spinlock, wait for the handler completion, and then acquire the lock again
> > and check if the timer has been added again. It doesn't look very robust.
>
> You are correct.
>
> But remember that this whole code path is a one-in-a-million thing. So
> having another one-in-a-million race within it has improved things by,
> umm, a million. Plus if it ever does happen, we blurt out a warning,
> and the end result is the same as what we have now.

I'm just proposing to take the assumption that the handler is non-destructive
if the code calls del_timer() and simplify and harden the code.

We cannot touch timer_list structure in run_timer_list() because the handler
_may_ destroy it's sure that other code doesn't call del_timer() or
mod_timer() on it at the same time.

But in del_timer() we may assume that the structure is valid during the whole
call. I suspect that in your timer-race-6 code we may just go to the
beginning of del_timer function after spinning on "*vtl == timer" and drop
all the added_timer stuff.

>
> But I'll look at adding a 'goto retry' into that code path. Just have
> to work out how to test it...

[snip]
> > I suggest to assume that if del_timer is called than the timer cannot be free'd
> > or destroyed. If it isn't true, there will be race conditions in any case.
> > The assumption will simplify things a lot.
>
> That is the assumption which the existing del_timer_sync() makes.
>
> Having recently looked at a zillion timer handlers I believe this
> assumption is safe. But adding timer_exit() to every timer handler is
> unattractive for obvious reasons (ditto the reinsert-protection flag).

Yes.

> The existing del_timer_sync also raises the possibility of the module
> being unloaded prior to the handler returning, but this is so
> ridiculously improbable that even I wouldn't make a fuss about it.

Not too much ridiculous.
It's a design problem. All design problems are problems independently of how
often they show.

>
> > What do you think about it?
>
> I think I need to think some more :) Fun stuff.
>
> I hereby do solemnly swear to write Documentation/kernel-timers.txt!

:-) You're taken at word!

Best regards
Andrey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: dart@iowaone.net: "epic100 && >=2.3.99pre7 && ping -f == reproducible lockup"
Previous message: Andre Hedrick: "Re: Adaptec AAA-131U2 RAID"
In reply to: Andrew Morton: "Re: kernel timer races"
Next in thread: Andrew Morton: "Re: kernel timer races"
Reply: Andrew Morton: "Re: kernel timer races"
Reply: kuznet@ms2.inr.ac.ru: "Re: kernel timer races"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:17 EST