Re: [patches] kernel timer races

From: Andrew Morton (andrewm@uow.edu.au)
Date: Sat May 27 2000 - 00:07:03 EST


kuznet@ms2.inr.ac.ru wrote:
>
>
> In 2.2 moral equivalent of del_timer_sync() is:
>
> start_bh_atomic();
> del_timer();
> end_bh_atomic();

Yes, except start_bh_atomic() returns immediately if called from BH or
IRQ context.

One thing which we must avoid is spinning on completion of ALL BH's.
This is because we would end up spending much, much time waiting for
irrelevant things like netdevice->hard_start_xmit().

So to synchronise we need to spin on something which is specific to
timers. Ideally, something which is specific to the single timer in
which we are interested. That's what my proposed patch does.

> rather than:
>
> del_timer();
> synchronize_bh();
>
> And wait_on_timers() is meaningless exactly by the same reason:
> periodic timer can be restarted after del_timer().
>
> What's about solution with waiting inside del_timer()...
> Synchronous del_timer() is inclined to deadlocks, so that
> you will not avoid auditing all these 256 places. 8)

Correct. I've looked at all the files which contain timer functions as
well as the string "spin". There may be a couple I've missed, but the
deadlock-detection code will trap these the first time they happen to
anyone. The machine will freeze for half a second or so then it shouts
out a URL, stack backtrace, etc. We'll find 'em.

> About your analysis: _ALL_ the del_timer()'s in
> net/{core,ipv4,ipv6,packet,netlink,sched,unix} are "asynchronous"
> on your language and should be replaced with del_timer_async().

Yes. But not bridge, netrom, decnet, ...

> I not quite understood the problems, which you found there.
> Please, explain.

I assumed that most of this was correct because it has your name all
over it :) So a blanket s/del_timer/del_timer_async/ is in order.

So I didn't look closely at core, ipv4, etc. I'll do so, but it's not
urgent.

I spotted two potential-I-dont-understand-this-stuff problems in ipv6:

inet6_ifa_finish_destroy() does a del_timer() and then kfree's 'ifp'.
But if the handler is still running, it is accessing fields within *ifp
(I think). Not very important, because inet6_ifa_finish_destroy() looks
like a 'cannot happen' function.

In reassembly.c, both calls to del_timer() are followed by a
frag_kfree_s(fq), but the handler could still be running, and it
accesses *fq.

BTW: I prefer the name "del_timer_async" - it makes people THINK before
using it. I'd like to rename del_timer_sync to del_timer_can_deadlock
as well :)

-- 
-akpm-

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:17 EST