Re: [PATCH 01/23 -v6] printk - dont wakeup klogd with interruptsdisabled

From: Pavel Machek
Date: Sat Jan 26 2008 - 04:10:52 EST


On Fri 2008-01-25 23:21:53, Steven Rostedt wrote:
> [ This patch is added to the series since the wakeup timings trace
> may lockup without it. ]
>
> I thought that one could place a printk anywhere without worrying.
> But it seems that it is not wise to place a printk where the runqueue
> lock is held.
>
> I just spent two hours debugging why some of my code was locking up,
> to find that the lockup was caused by some debugging printk's that
> I had in the scheduler. The printk's were only in rare paths so
> they shouldn't be too much of a problem, but after I hit the printk
> the system locked up.
>
> Thinking that it was locking up on my code I went looking down the
> wrong path. I finally found (after examining an NMI dump) that
> the lockup happened because printk was trying to wakeup the klogd
> daemon, which caused a deadlock when the try_to_wakeup code tries
> to grab the runqueue lock.
>
> This patch adds a runqueue_is_locked interface in sched.c for other
> files to see if the current runqueue lock is held. This is used
> in printk to determine whether it is safe or not to wakeup the klogd.
>
> And with this patch, my code ran fine ;-)
>
> Signed-off-by: Steven Rostedt <srostedt@xxxxxxxxxx>
> ---
> include/linux/sched.h | 2 ++
> kernel/printk.c | 14 ++++++++++----
> kernel/sched.c | 18 ++++++++++++++++++
> 3 files changed, 30 insertions(+), 4 deletions(-)
>
> Index: linux-mcount.git/kernel/printk.c
> ===================================================================
> --- linux-mcount.git.orig/kernel/printk.c 2008-01-25 21:46:50.000000000 -0500
> +++ linux-mcount.git/kernel/printk.c 2008-01-25 21:46:55.000000000 -0500
> @@ -590,9 +590,11 @@ static int have_callable_console(void)
> * @fmt: format string
> *
> * This is printk(). It can be called from any context. We want it to work.
> - * Be aware of the fact that if oops_in_progress is not set, we might try to
> - * wake klogd up which could deadlock on runqueue lock if printk() is called
> - * from scheduler code.
> + *
> + * Note: if printk() is called with the runqueue lock held, it will not wake
> + * up the klogd. This is to avoid a deadlock from calling printk() in schedule
> + * with the runqueue lock held and having the wake_up grab the runqueue lock
> + * as well.
> *
> * We try to grab the console_sem. If we succeed, it's easy - we log the output and
> * call the console drivers. If we fail to get the semaphore we place the output
> @@ -1003,7 +1005,11 @@ void release_console_sem(void)
> console_locked = 0;
> up(&console_sem);
> spin_unlock_irqrestore(&logbuf_lock, flags);
> - if (wake_klogd)
> + /*
> + * If we try to wake up klogd while printing with the runqueue lock
> + * held, this will deadlock.
> + */
> + if (wake_klogd && !runqueue_is_locked())
> wake_up_klogd();
> }

I guess you are going to kill me... but

CPU0 CPU1
if (!runqueue_is_locked()) {
locks runqueue
wake_up_klogd

...and we are dead. What is needed here is
"wake_up_klogd_if_you_can()" or something, that does trylock (atomic).

...but even this version is better than status quo, I'd say.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/