Re: [PATCH] vfs: replace current_kernel_time64 with ktime equivalent

From: Arnd Bergmann
Date: Mon Jun 25 2018 - 09:43:03 EST


On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>> Arnd Bergmann <arnd@xxxxxxxx> writes:
>>>
>>> To clarify: current_kernel_time() uses at most millisecond resolution rather
>>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the
>>> timer tick.
>>
>> Ah you're right. I remember now: the motivation was to make sure there
>> is basically no overhead. In some setups the full gtod can be rather
>> slow, particularly if it falls back to some crappy timer.
>
> This means, we're probably fine with a compile-time option that
> distros can choose to enable depending on what classes of hardware
> they are targetting, like
>
> struct timespec64 current_time(struct inode *inode)
> {
> struct timespec64 now;
> u64 gran = inode->i_sb->s_time_gran;
>
> if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) &&
> gran <= NSEC_PER_JIFFY)
> ktime_get_real_ts64(&now);
> else
> ktime_get_coarse_real_ts64(&now);
>
> return timespec64_trunc(now, gran);
> }
>
> With that implementation, we could still let file systems choose
> to get coarse timestamps by tuning the granularity in the
> superblock s_time_gran, which would result in nice round
> tv_nsec values that represent the actual accuracy.

I've done some simple tests and found that on a variety of
x86, arm32 and arm64 CPUs, it takes between 70 and 100
CPU cycles to read the TSC and add it to the coarse
clock, e.g. on a 3.1GHz Ryzen, using the little test program
below:

vdso hires: 37.18ns
vdso coarse: 6.44ns
sysc hires: 161.62ns
sysc coarse: 133.87ns

On the same machine, it takes around 400ns (1240 cycles)
to write one byte into a tmpfs file with pwrite(). Adding 5% to
10% overhead for accurate timestamps would definitely be
noticed, so I guess we wouldn't enable that unconditionally,
but could do it as an opt-in mount option if someone had a
use case.

Arnd

---
/* measure times for high-resolution clocksource access from userspace */
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <stdbool.h>
#include <sys/syscall.h>

static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso)
{
if (vdso)
return clock_gettime(clkid, tp);

return syscall(__NR_clock_gettime, clkid, tp);
}

static int loop1sec(int clkid, bool vdso)
{
int i;
struct timespec t, start;

do_clock_gettime(clkid, &start, vdso);
i = 0;
do {
do_clock_gettime(clkid, &t, vdso);
i++;
} while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec);

return i;
}

int main(void)
{
printf("vdso hires: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME, true));
printf("vdso coarse: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME_COARSE, true));
printf("sysc hires: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME, false));
printf("sysc coarse: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME_COARSE, false));

return 0;
}