Re: [tip:timers/urgent] clocksource: Prevent potential kgdb deadlock

From: Jason Wessel
Date: Tue Jan 26 2010 - 15:47:19 EST

Next message: Matt Mackall: "Re: BUG at mm/slab.c:2990 with 2.6.33-rc5-tuxonice"
Previous message: Stefan Richter: "[PATCH] firewire: ohci: fix crashes with TSB43AB23 on 64bit systems"
In reply to: Andrew Morton: "Re: [tip:timers/urgent] clocksource: Prevent potential kgdb deadlock"
Next in thread: Jason Wessel: "[PATCH 4/4] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Andrew Morton wrote:
> On Tue, 26 Jan 2010 14:09:45 GMT tip-bot for Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
>
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -343,7 +343,19 @@ static void clocksource_resume_watchdog(void)
>> {
>> unsigned long flags;
>>
>> - spin_lock_irqsave(&watchdog_lock, flags);
>> + /*
>> + * We use trylock here to avoid a potential dead lock when
>> + * kgdb calls this code after the kernel has been stopped with
>> + * watchdog_lock held. When watchdog_lock is held we just
>> + * return and accept, that the watchdog might trigger and mark
>> + * the monitored clock source (usually TSC) unstable.
>> + *
>> + * This does not affect the other caller clocksource_resume()
>> + * because at this point the kernel is UP, interrupts are
>> + * disabled and nothing can hold watchdog_lock.
>> + */
>> + if (!spin_trylock_irqsave(&watchdog_lock, flags))
>> + return;
>> clocksource_reset_watchdog();
>> spin_unlock_irqrestore(&watchdog_lock, flags);
>> }
>>
>> @@ -458,8 +470,8 @@ void clocksource_resume(void)
>> * clocksource_touch_watchdog - Update watchdog
>> *
>> * Update the watchdog after exception contexts such as kgdb so as not
>> - * to incorrectly trip the watchdog.
>> - *
>> + * to incorrectly trip the watchdog. This might fail when the kernel
>> + * was stopped in code which holds watchdog_lock.
>> */
>> void clocksource_touch_watchdog(void)
>> {
>>
>
> It would be neater and a shade more reliable to add a separate
> clocksource_try_touch_watchdog() for kgdb's use. Which could be inside
> #ifdef CONFIG_KGDB.
>
>

The kernel debugger is the only user of this API function so
realistically you do not need a new function, and could simply rename
this one, if the name does not fit.

-- more notes on the problem --

I further diagnosed the problem, and it is reasonably rare that it
happens in the first place. I had an instrumented version of the kernel
debugger which showed the problem happening quite frequently, while
trying to fix hw breakpoints.

The problem is not actually an interprocessor deadlock, but that of
re-entering the spin_lock() when it is already held by the interrupted
task on the same cpu. There is no obvious way that I could see to check
for this specific condition and to abort.

For now the patch Thomas created is sufficient, and given some time
we'll decide how to best live with the side effects or to consider any
further changes.

Jason.

Tested-by: Jason Wessel <jason.wessel@xxxxxxxxxxxxx>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Matt Mackall: "Re: BUG at mm/slab.c:2990 with 2.6.33-rc5-tuxonice"
Previous message: Stefan Richter: "[PATCH] firewire: ohci: fix crashes with TSB43AB23 on 64bit systems"
In reply to: Andrew Morton: "Re: [tip:timers/urgent] clocksource: Prevent potential kgdb deadlock"
Next in thread: Jason Wessel: "[PATCH 4/4] softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]