Re: [PATCH] Timekeeping: Fix dead lock in update_wall_time bycorrect shift convertion.

From: john stultz
Date: Wed Mar 17 2010 - 11:59:43 EST


On Wed, 2010-03-17 at 13:14 +0800, Sonic Zhang wrote:
> With you new workaround, no dead loop. But are you sure this doesn't
> overflow the ntp_error after thousands of loops?
>
> timekeeper.ntp_error += tick_length << shift;
> timekeeper.ntp_error -= timekeeper.xtime_interval <<
> (timekeeper.ntp_error_shift + shift);


At some point, yes it could overflow, but for that to happen, we'd have
to have accumulated over 4 seconds of time error between calls to
update_wall_time. At a max error rate of 500ppm, that would mean over
two hours of delay between calls.

The time subsystem can try to accommodate reasonable stalls in the
system, but i think there will always be windows in which KGDB could
cause the system to not recover (ie: i know quite of bit of scsi
hardware have heartbeat requirements, so I could imagine kgdb causing
those watchdogs to trigger and reset the device).

One approach would be to have KGDB suspend the timekeeping core, much as
is done over suspend/resume. This should be able to protect us from any
overflows, but I suspect its unlikely that we'd want to go run other
kernel stuff when breaking into KGDB.

Thanks again for the testing. I'll try to send out an improved version
of the fix for testing later today. If you could confirm it works as
well, I'd appreciate it.

thanks
-john




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/