Re: [musl] Question about musl's time() implementation in time.c

From: Thomas Gleixner
Date: Thu Jun 16 2022 - 05:06:35 EST

Next message: Peter Zijlstra: "Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch"
Previous message: Jiapeng Chong: "[PATCH] usb: gadget: Remove unnecessary print function dev_err()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jun 15 2022 at 14:09, Arnd Bergmann wrote:
> On Wed, Jun 15, 2022 at 1:28 AM Rich Felker <dalias@xxxxxxxx> wrote:
> Adding the kernel timekeeping maintainers to Cc. I think this is a
> reasonable argument, but it goes against the current behavior.
>
> We have four implementations of the time() syscall that one would
> commonly encounter:
>
> - The kernel syscall, using (effectively) CLOCK_REALTIME_COARSE
> - The kernel vdso, using (effectively) CLOCK_REALTIME_COARSE
> - The glibc interface, calling __clock_gettime64(CLOCK_REALTIME_COARSE, ...)
> - The musl interface, calling __clock_gettime64(CLOCK_REALTIME, ...)
>
> So even if everyone agrees that the musl implementation is the
> correct one, I think both linux and glibc are more likely to stick with
> the traditional behavior to avoid breaking user space code such as the
> libc-test case that Zev brought up initially. At least Adhemerval's
> time() implementation in glibc[1] appears to have done this intentionally,
> while the Linux implementation has simply never changed this in an
> incompatible way since Linux-0.01 added time() and 0.99.13k added
> the high-resolution gettimeofday().

That's correct. Assumed this call order:

clock_gettime(REALTIME, &tr);
clock_gettime(REALTIME_COARSE, &tc);
tt = time();

You can observe

tr->sec > tc->sec
tr->sec > tt

but you can never observe

tc->sec > tt

The reason for this is historical and time() has a distinct performance
advantage as it boils down to a single read and does not require the
sequence count (at least on 64bit). Coarse REALTIME requires the
seqcount, but avoids the hardware read and the larger math.

The costy part is the hardware read. Before TSC became usable, the
hardware read was a matter of microseconds, so avoiding it was a
significant performance gain. With a loop of 1e9 reads (including the
loop overhead) as measured with perf on a halfways recent SKL the
average per invocation is:

time() 7 cycles
clock_gettime(REAL_COARSE) 21 cycles
clock_gettime(REAL) TSC 60 cycles
clock_gettime(REAL) HPET 6092 cycles (~2000 cycles syscall overhead)
clock_gettime(REAL) ACPI_PM 4096 cycles (~2000 cycles syscall overhead)

So at the very end it boils down to performance and expectations. File
systems have chosen their granularity and the underlying mechanism to
get the timestamp according to that.

It's clearly not well documented, but I doubt that we can change the
implementation without running into measurable performance regressions.

VDSO based time() vs. clock_gettime(REAL) TSC is almost an order of
magnitude...

Thanks,

tglx

Next message: Peter Zijlstra: "Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch"
Previous message: Jiapeng Chong: "[PATCH] usb: gadget: Remove unnecessary print function dev_err()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]