Re: /proc/stat btime accuracy problem

From: john stultz
Date: Wed Jun 01 2011 - 19:58:42 EST


On Wed, 2011-06-01 at 17:35 -0600, Bjorn Helgaas wrote:
> On Wed, Jun 1, 2011 at 4:35 PM, john stultz <johnstul@xxxxxxxxxx> wrote:
> > On Wed, 2011-06-01 at 14:50 -0600, Bjorn Helgaas wrote:
> >> timekeeping_init() basically does the following:
> >>
> >> xtime = RTC
> >> if (arch implements read_boot_clock())
> >> wall_to_monotonic = -read_boot_clock()
> >> else
> >> wall_to_monotonic = -xtime
> >>
> >> So wall_to_monotonic records some approximation of the system boot
> >> time, which is then used to derive the "btime" reported in /proc/stat.
> >>
> >> The problem I'm seeing is that xtime is updated on timer ticks, so
> >> uninterruptible code, like kernel serial printk, makes us miss ticks,
> >> so xtime falls behind the RTC.
> >
> > Huh. So this sort of issue was common back when we had tick-based
> > timekeeping (in combination with troubled hardware), but with the
> > current clocksource based timekeeping, occasional lost ticks shouldn't
> > really effect time.
>
> Makes sense. Your presentation here was a great help:
> http://sr71.net/~jstultz/tod/ols-presentation-final.pdf
>
> > Can you explain a bit more about what kind of hardware this is happening
> > on, and what clocksource is being used?
>
> Sure. This is an x86 box. Normally we're using the TSC clocksource,
> and I don't think the issue happens after that. I guess my
> experimentation so far has been with uninterruptible time before we
> register *any* clocksource (or at least before I see any "Switching to
> clocksource" messages).

Huh.

So yea, if we are very early at boot, we're likely using the jiffies
clocksource, which is basically a software-based tick counter, which
would be prone to lost-ticks issues if irqs were disabled for too long.

Do you know if this is this a relatively new issue?

My first instinct is "don't do that!" to whatever driver is disabling
irqs for so long. Do you know what's actually causing these long irq off
periods?

I assume you're noticing this offset by seeing that CLOCK_REALTIME is
off from the RTC right after boot? How severe is this? The RTC read is
only second granular, so there's a fair amount of error (~1 second)
possible right at boot, so this then must be many seconds worth of lost
ticks to be noticeable, right?

thanks
-john





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/