40ms/10ms error in do_gettimeofday()

From: Bernard Imbert (imbert@ipanematech.com)
Date: Wed Apr 05 2000 - 03:00:41 EST


I am developping an application in which very accurate time stamping
of every incoming/outgoing Ethernet packet is needed. I use for this
purpose Linux version 2.2.13, with NTP protocol (version 4.0.98m)
feeded by a GPS receiver (PPSKit-0.9.1). The Ethernet interface is
based on a RealTek RTL8139 chip, driven by software "rtl8139.c"
release 1.08m. The CPU is an Intel x86 compatible (actually a Cyrix).
Time is read with "do_gettimeofday()" (kernel/time.c) which -as you
know-
eventually calls "do_poor_nanotime()" (arch/i386/kernel/time.c)

>From an external tool, I send one packet (of 256 bytes) every 10ms
on the Ethernet bus. When checking the time measurement made by my
application,
I used to see some errors of about 40ms in the time returned by
do_gettimeofday().
Here is the reason I've found: if you're (un)lucky enough, you may read
the value 0
in the hardware counter, and so the expression "LATCH-1" will lead to an
erroneous
computing of elapsed nanosecond (in arch/i386/kernel/time.c):

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
--- time.c_orig Tue Apr 4 14:14:15 2000
+++ time.c_new Tue Apr 4 14:14:36 2000
@@ -282,7 +282,7 @@
         * computing nanoseconds. That way the intermediate result
should not
         * overflow. After the division we undo the scaling.
         */
- temp = ((LATCH-1) - count) * (time_tick >> 7) + LATCH / 2;
+ temp = (LATCH - count) * (time_tick >> 7) + LATCH / 2;
        count = (temp / LATCH) << 7;
        if (saved_pending_ticks)
                count += saved_pending_ticks * time_tick;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

(I guess this fix should be applied in other places in the code where
LATCH is used, but
I'm not used enough in kernel hacking to do the whole job...)

But now, I have another problem: under the same conditions (one packet
every 10ms),
I have time errors of about 10ms... The time error is ALWAYS in the same
direction,
I mean: the time returned by do_gettimeofday() is always 10ms LATE in
these
error cases (when time should be T, this function returns T-10ms).
These errors appear quite frequently (approx. 2 or 3 times per 5
minutes).
I guess there is a race condition somewhere, but I can't find it....
Any idea?

Thanks for your help.
(I'm currently registering this list, so please reply to my e-mail
address)
Bernard

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:11 EST