Re: 2.6.13-rc6-rt9
From: Steven Rostedt
Date: Fri Aug 19 2005 - 01:40:41 EST
Deadlock finally found!!!
I've been debugging this all week. And at 2:30 in the morning I finally
found where it is. It really sucks when you need to debug on something
that doesn't have a serial, and netconsole doesn't work that reliably.
This also explains why this only happened on my laptop. It is caused by
the dependent_sleeper, which is used quite a bit when you have SCHED_HT
turned on. Although I wouldn't be surprised if this deadlock exists
elsewhere.
The dependent_sleeper (used by SCHED_HT) grabs all the run queue locks
for the physical CPU. Then it calls trace_stop_sched_switched. So this
is the calling chain.
dependent_sleeper
+==> grabs CPU0 rq and CPU1 rq lock (saying CPU 0 and 1 are on the
same physical CPU)
-> trace_stop_sched_switched
-> check_wakeup_timing
-> down_trylock(max_mutex)
-> rt_down_trylock
-> __down_trylock
+==> grabs trace_lock
Now lets look at something at the same time that is unlocking.
rt_up
-> __up_mutex_nosavestate_inline
-> ___up_mutex_nosavestate
-> ____up_mutex
+==> grabs trace_lock
-> __up_mutex_waiter_nosavestate
-> wake_up_process
-> try_to_wake_up
+==> grabs rq lock
Here we can see that there's a reverse order here and we have a
deadlock. I actually witness this using my logger to show the traces.
All I needed was the last few lines, so the console was fine here.
Ingo, can't you get rt.c to be more confusing. I mean it is too simple.
We need to add a few more underscores here and there :-) Seriously,
that rt.c is mind boggling. It was nice before, now it is just screaming
for a cleanup (come now, do we really need the four underscores?). Same
with latency.c.
Well, there's the deadlock, I'm too tired to figure out all the paths,
and what the heck is going on. So I'll give you the honour of writing
the patch. ;-)
Thanks,
-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/