Re: [PATCH] timer tsc ensure we allow for initial tsc and tsc sync

From: john stultz
Date: Fri Jan 27 2006 - 17:09:45 EST


On Fri, 2006-01-20 at 12:53 +0000, Andy Whitcroft wrote:
> We have been seeing "BUG: soft lockup detected on CPU#0!" messages
> from testing runs of mainline for some time. This has only been
> showing on a very small subset of the systems under test, as it
> turns out the slower ones.
>
> Resolving this issue is complex, not because the fix itself is
> complex but because of the timer rework which is currently pending
> in -mm. As a result this patch is against 2.6.16-rc1. So far
> we have had no such errors from runs against -mm, but I am unsure
> whether that system eliminates this issue, or mearly is lucky as
> faster systems are currently with mainline.

Hey Andy, Sorry for the slow reply.

The timekeeping rework is not going to go into 2.6.16 and is currently
out of -mm until I can resolve a few laptop issues.


> John perhaps you could comment? Also, how experimental is the timer
> code, is it likely to go into 2.6.16 or is it more experimental
> than that? If so perhaps we need to try and slip a fix like this
> underneath it.

I'd def try to push a fix in for the issue. I'll just merge my code
around the fix.



> timer tsc ensure we allow for initial tsc and tsc sync
>
> During early initialisation we select the timer source to use for
> the high resolution time source. When selecting the TSC we don't
> take into account the initial value of the TSC, this leads to a
> jump in the clock at the next clock tick. We also fail to take into
> account that the TSC synchronisation in an SMP system resets the TSC.
>
> In both cases this will lead to the timer believing that 0-N TSC
> ticks have passed since the last real timer tick. This will lead
> to the clock jumping ahead by nearly the maximum time represented
> by the lower 32bits of the TSC. For a 1GHz machine this is close to
> 4s, on slower boxes this can exceed 10s and trip the softlock tests.


This sounds very similar to bugme bug #5366
http://bugzilla.kernel.org/show_bug.cgi?id=5366


There's a test patch in there that maybe you could try?


thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/