Re: [PATCH 3/4] kgdb,clocksource: Prevent kernel hang in kerneldebugger

From: Thomas Gleixner
Date: Tue Jan 26 2010 - 03:46:10 EST


On Mon, 25 Jan 2010, Jason Wessel wrote:
> This is a regression fix against: 0f8e8ef7c204988246da5a42d576b7fa5277a8e4
>
> Spin locks were added to the clocksource_resume_watchdog() which cause
> the kernel debugger to deadlock on an SMP system frequently.
>
> The kernel debugger can try for the lock, but if it fails it should
> continue to touch the clocksource watchdog anyway, else it will trip
> if the general kernel execution has been paused for too long.
>
> This introduces an possible race condition where the kernel debugger
> might not process the list correctly if a clocksource is being added
> or removed at the time of this call. This race is sufficiently rare vs
> having the kernel debugger hang the kernel

I'm not really excited happy about adding a race condition :)

If you stop the kernel in the middle of the watchdog code
(i.e. watchdog_lock is held) then clocksource_reset_watchdog() is not
really a guarantee to keep the TSC alive.

> void clocksource_touch_watchdog(void)
> {
> - clocksource_resume_watchdog();
> + unsigned long flags;
> +
> + int got_lock = spin_trylock_irqsave(&watchdog_lock, flags);

So I prefer

if (!spin_trylock_irqsave(&watchdog_lock, flags))
return;

If that results in TSC being marked unstable then that is way better
than having a race which might even crash or lock the machine when the
stop happened in the middle of a list_add().

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/