Re: gettimeofday resolution seriously degraded in test9

From: Gabriel Paubert
Date: Wed Oct 29 2003 - 05:11:52 EST


On Tue, Oct 28, 2003 at 10:21:20AM -0800, Stephen Hemminger wrote:
> On Tue, 28 Oct 2003 12:55:58 +0100
> Gabriel Paubert <paubert@xxxxxxx> wrote:
>
> > On Mon, Oct 27, 2003 at 05:17:38PM -0800, Stephen Hemminger wrote:
> > > Arghh... the patch was being way more agressive than necessary.
> > > tickadj which limits NTP is always 1 (for HZ=1000) so NTP will change
> > > at most 1 us per clock tick. This meant we only had to stop time
> > > for the last us of the interval.
> >
> > Hmm, I still don't like it. What does it do to timestamping in
> > interrupts in the kernel, especially when there is a burst of
> > interrupts?
> >
> > If I read it correctly, the time will be frozen between the time
> > the timer interrupt should have arrived and the time it is processed.
> > So the last micosecond of the interval could extend well into the next
> > interval, or do I miss something (I also suspect that it could
> > make PPSKit behave strangely for this reason)?
>
> The original problem all this is solving is that when NTP is slowing the clock
> there existed real cases where time appeared to go backwards. Assuming NTP was
> slowing the clock, then it would update the xtime by 999us at the next timer interrupt.
> If a program read time three times:
>
> A: xtime = t0
> B: A+1000 xtime = t0 + 1000
> C: B+1 xtime = t0 + 999
>
> To behave correctly C > B > A; but we were returning C < B

Well, it happens almost everyday when I come back to work
and wake my laptop up from sleep and starts receiving ntp packets
(broadcastclient mode). After a few minutes, I have a time adjustment
step of of typically a few hundred milliseconds either way. It's a
once per day event, but it happens.

But anyway, I understand your point. But by doing this you potentially
cause another problem of fairly large steps in time for interrupt handlers
that do timestamping.

Consider the following:

- t-2: interrupt A arrives and starts being serviced
- t-1: interrupt B arrives but delayed in the APIC
- t: timer interrupt arrives (it is delayed too)
- t+x1: return from interrupt A
- t+x2: interrupt B serviced
- gettimeofday for time stamping, the returned value will actually
be frozen at t-1 for HZ=1000 or t-5 for HZ=100, while the actual
time is t+something with something maybe up to a few tens of
microseconds, instead of t+x2-1 or t+x2-5 which would be
clearly better.
- t+x3: timer interrupt, time steps suddenly now (or in
the following BH, can't remember) from t-1 to the correct
value, creating a fairly large discontinuity.

So what I'm asking you is to change the code so that the discontinuities
are minimized. That's quite important for some applications; actually
in my case the out-of-order gettimeofday don't matter as long as the
steps are small because I don't sample fast enough to be affected
by them, but getting the timestamp wrong by tens of microseconds is bad
for evaluating the derivatives of the value read from a position encoder,
as needed for servo loops for example.



>
> The code does have bug if we are losing clock interrupts. The test for
> lost interrupts needs to be after the interval clamp.

If we are really losing clock interrupts, we have more serious problems
anyway (on x86 with classical 8253 timer, I don't know about other x86
systems, only that PPC recovers from them rather well). This misnamed
lost_jiffies variable is not the number of lost interrupts, but actually
the number of jiffies for which the bottom half has not (yet) been
performed.

> This should work better. Patch against 2.6.0-test9

Still unhappy...

Gabriel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/