Re: [PATCH] clocksource, prevent overflow in clocksource_cyc2ns
From: Prarit Bhargava
Date: Thu Apr 19 2012 - 07:56:50 EST
On 04/18/2012 08:18 PM, John Stultz wrote:
> On 04/18/2012 04:59 PM, Prarit Bhargava wrote:
>>
>> Hey John,
>>
>> Thanks for continuing to work on this. Coincidentally that exact patch was my
>> first attempt at resolving the problem as well. The problem is that even after
>> touching the clocksource watchdog and restoring irqs the printk buffer can take
>> a LONG time to flush -- and that still will cause an overflow comparison. So
>> fixing it with just a touch_clocksource_watchdog() isn't the right thing to do
>> IMO. Maybe a combination of the printk() patch you suggested earlier and the
>> touch_clocksource_watchdog() is the right way to go but I'll leave that up to
>> tglx and yourself to decide on a correct fix.
> :( That's a bummer. Something similar may be useful on the printk side as well.
Hmm ... I'll give that a shot.
>
>
>> There's also some additional information that I've been gathering on this issue;
>> I have seen *idle* systems switch to the hpet because the clocksource watchdog
>> hits the overflow comparison. As expected it happens much less frequently on
>> newer kernels (linux.git top of tree) than older stable kernels (2.6.32 based)
>> due to the difference in shift values but it is happening in both cases.
>
> Some of the recent adjustments for more robust shift calculations may partially
> be responsible for the improvement. Although I'm not sure why idle systems (that
> don't halt the TSC in idle) would trip this. Do let me know if you find any
> particular way of reproducing this.
>
>> The odd thing about this behaviour is that I would expect it to occur with the
>> same frequency on small systems as it does on large systems with linux.git as
>> the watchdog fires once/second. AFAICT I do not see this on small systems but
>> see it only on systems with greater than 24 cpus (both Intel and AMD).
> Hrm.
Yeah, it's odd. I have no idea why more cpus makes any difference :/
>
>> Using debug code similar to the dump code I previously provided, I can see that
>> every so often these large systems can hit a case where the tsc wraps and the
>> hpet is still monotonically increasing. When the unstable calculation is
>> performed the result is obviously affected by the overflow. Sometimes this
>> comparison overflow happens within 18 minutes, other times it can take hours or
>> days.
> TSC wraps? Are you sure that's what you see? Or do you have that switched? With
> the HPET wrapping?
Sorry, you're right -- the HPET wraps. I mistyped that.
>
>
>> The other part of this puzzle is that if switch between the tsc and hpet every
>> 10 seconds, and run a gettimeofday() comparison program, the gettimeofday()
>> program will return a backwards time[1] event usually within half-an-hour. [I'm
>> just including this info here to point out that switching between clocksources
>> seems to cause some momentary instability. Before anyone points this out I will
>> say that this not a "real world" bug. I'm trying to find out if anyone actually
>> does switch from the tsc to hpet (and back) on multi-purposed systems. I'm
>> hoping the answer to that is "no" :) ].
> So, there were some recent fixes for 3.4 to address an issue specifically around
> inconsistencies at clocksource switch time:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a939e817aa7e199d2fff05a67cb745be32dd5c2d
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f695cf94837de53864180400cbac42cfa370426f
>
AFAICT I have both of these in my tree. It is linux-2.6.git as of
592fe8980688e7cba46897685d014c7fb3018a67.
I am doing
while (true)
do
val=`ps aux | egrep $1 | wc -l`
if [ $val -ne 2 ]; then
exit 1
fi
echo "switching to tsc"
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
sleep 10
val=`ps aux | egrep $1 | wc -l`
if [ $val -ne 2 ]; then
exit 1
fi
echo "switching to hpet"
echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
sleep 10
done
where $1 is the pid of my gettimeofday() comparison test. As I said, the test
exists when a "backwards" time event occurs so the script above also bails.
>
> I definitely want to make sure any sort of inconsistencies like that are
> resolved. So let me know if you can still trigger anything like that with the
> latest 3.4 kernel.
I'll dig into this a bit more then -- I have a few things I want to investigate.
I'll also try the touch_clocksource_watchdog() in the printk() code and get
back to in a few days.
P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/