Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.

From: Thomas Gleixner
Date: Sat Aug 17 2019 - 15:23:52 EST


On Sat, 17 Aug 2019, Arul Jeniston wrote:

> Do you agree the possibility of returning zero value from timerfd_read()?

Obviosuly it happens.

> > That has absolutely nothing to do with CLOCK_REALTIME. Your machines
> > TSC is either going backwards or not synchronized between cores.
> >
> > Hint: Dell has a track record of BIOS doing the wrong things to TSC in
> > order to hide their 'value add' features stealing CPU time.
> We haven't seen any issue in Dell hardware but we would definitely check
> the possibility of hardware bug.

BIOS is the more likely candidate.

> Let us say, due to some reason the tsc goes backwards. Isn't it handled in
> clocksource_delta().

No. clocksource_delta() does damage limitation. It prevents insane large
time jumps which would result if the read out TSC value is less than the
base value which is used to calculate the delta. It cannot do more than

> Is timerfd_read expected to return 0 if tsc drifts backwards? If so, why it
> is not documented?
> Being a system call, we expect all return values of read() on timerfd to be
> documented in the man page.

We expect TSC not to go backwards. If it does it's broken and not usable as
a clocksource for the kernel. The problem is that this is not necessarily
easy to detect.

Fact is, that your machines TSC goes backwards or is not properly
synchronized between the cores. Otherwise the problem would not exist at
all. That's the problem which needs to be fixed and not papered over with
crude hacks and weird justifications.

> > ktime_get() is CLOCK_MONOTONIC and not CLOCK_REALTIME.
> We see the same base used for CLOCK_MONOTONIC, CLOCK_REALTIME timers here.
> both MONOTONIC, REALTIME timers hits the following code flow. we confirmed
> it through instrumentation.
> timerfd_read()-->hrtimer_forward_now()-->ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta().
> Do you want me to share any other logs to confirm it?

No. That's the case when you use a relative timer with CLOCK_REALTIME
because only absolute timers are affected by modifications of
CLOCK_REALTIME. So it's NOT an issue of a CLOCK_REALTIME modification via
settimeofday() or adjtimex().

> > It's a bug, but either a hardware or a BIOS bug and you are trying to
> > paper over it at the place where you observe the symptom, which is
> > obviously the wrong place because:
> >
> > 1) Any other time related function even in timerfd is affected as well
> >
> > 2) We do not cure symptoms, we cure the root cause. And clearly the root
> > cause hase not been explained and addressed.
> if we don't fix this in kernel, can we document this return value in
> timerfd read() man page?


You cannot fix a hardware problem by hacking around it at exactly one place
where you can observe it. If that problem exists on your machine, then any
other time related function is affected as well.

Are you going to submit patches against _ALL_ time{r} related syscalls to
fix^Wpaper over this? Either against the kernel or against the man pages?

As this is a 4 core Rangely, it has a properly synchronized TSC on all 4
cores which runs with constant frequency and is not affected by deeper

Here is the flow:



timer interrupt()
time = ktime_get()
expire timer time >= timer_time

now = ktime_get() In the failure case now < timer_time

i.e. time went backwards since the timer was expired. That's absolutely
unexpected behaviour and no, we are not papering over it.

Did you ever quantify how much time goes backwards in that case?

Is the timer expiry and the timerfd_read() on the same CPU or on different

Can you please provide a full dmesg from boot to after the point where this
failure happens?