Re: kernel timer races

From: Andrey Savochkin (saw@saw.sw.com.sg)
Date: Fri May 26 2000 - 21:28:19 EST


Hi Andrew,

On Sat, May 27, 2000 at 02:01:43AM +1000, Andrew Morton wrote:
> But del_timer_sync has a few problems, as I mentioned. Plus: if the
> handler kfree's the timer (I've found a couple of these, BTW) then
> del_timer_sync() will hang if slab poisoning is turned on.

The places where handler kfree's the timer and the timer can be deleted by
del_timer_sync, would be very broken. It's obvious that if you manipulate
with the timer outside of the handler, you cannot destroy it inside the
handler. Caller of any kind of del_timer() should provide a valid structure.
But if the handler kfree's or destroys it by any other means, the structure may
happen to become invalid not only inside del_timer(), but just before the
call. So, preserving the timer_list structure in the handler is a natural
requirement for timer deletion.

In you spintimers.txt I see only one fishy spot with this respect:
./net/ipv6/reassembly.c. It looks like it's intrinsicly broken.

del_timer_sync() eliminates most race conditions.
The only real problem with it is that it doesn't guarantee that the code
segment may be released. And this problem may be solved by the proposed
wait_on_timers() call.

> > I like the Alan's proposal.
> > Actually, with wait_on_timers() and reinsert protection flag we do not need
> > del_timer_sync for most drivers. wait_on_timers() has larger overhead
> > because it waits for all timers, but, I think, it's acceptable for device
> > close call.
>
> Sure. But del_timer() + wait_on_timers() + dont-re-add-flag ==
> del_timer_sync()?

It's more than del_timer_sync() because it ensures that the code segment
isn't used by timer handler, and, thus, is suitable for module cleanups.

>
> It helps a bit; we also need to do something about the fact that the
> handler can be executing after the del_timer, even if it's the last time
> it'll run. Only a tiny percentage of the clients of the kernel timers
> are designed to cope with this.

wait_on_timers() solves it.

[snip]
> We have a choice of:
>
> 1: Give del_timer synchronous semantics and patch the
> above 25 files or
>
> 2: Fix del_timer_sync and then use it in ~250 files or
>
> 3: Throw up our hands and run away, as appeared to
> happen in 2.1.x
>
> I think you're implying that it should be option 2. That's OK - it's a
> lot more work but at least I get to read most of the kernel source, fix
> other bugs as they pop up and get to meet lots of maintainers :)
>
> But it's a lot more work and options 1 and 2 are equivalent. The only
> difference is in the names of the functions!

I don't insist on any of the options.
But you have certainly started a hard and painful work :-)

I've looked more precisely over your timer-race-6 patch.
It has not so big overhead as I expected for the first time.
However, I think it has a potential problem. In del_timer() you drop the
spinlock, wait for the handler completion, and then acquire the lock again
and check if the timer has been added again. It doesn't look very robust.
You relies on what may happen and what may not in the period when you do not
hold the lock. What if the timer has been readded by the handler and then
successfully deleted on third CPU?

I suggest to assume that if del_timer is called than the timer cannot be free'd
or destroyed. If it isn't true, there will be race conditions in any case.
The assumption will simplify things a lot.
What do you think about it?

Best regards
                Andrey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:17 EST