Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
From: Thomas Gleixner
Date: Fri Aug 16 2019 - 17:17:32 EST
Arul,
On Fri, 16 Aug 2019, Arul Jeniston wrote:
> Adding few more data points...
Can you please trim your replies? It's annoying to have to search for the
meat of your mail by scrolling down several pages and paying attention to
not skip something useful inside of useless information.
> On Fri, Aug 16, 2019 at 10:25 PM Arul Jeniston <arul.jeniston@xxxxxxxxx> wrote:
> > On Fri, Aug 16, 2019 at 4:15 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > We use CLOCK_REALTIME while creating timer_fd.
> > Can read() on timerfd return 0 when the clock is set to CLOCK_REALTIME?
As CLOCK_REALTIME is subject to be set by various mechanisms, yes. See
timerfd_clock_was_set(). If that's the case, your application is missing
something. But see below ...
> > We have Intel rangely 4 cpu system running debian stretch linux
> > kernel. The current clock source is set to tsc. During our testing, we
> > observed the time drifts backward occasionally. Through kernel
> > instrumentation, we observed, sometimes clocksource_delta() finds the
> > current time lesser than last time. and returns 0 delta.
That has absolutely nothing to do with CLOCK_REALTIME. Your machines TSC is
either going backwards or not synchronized between cores.
Hint: Dell has a track record of BIOS doing the wrong things to TSC in
order to hide their 'value add' features stealing CPU time.
> This causes the following code flow to return a time which is lesser
> than previously fetched time.
> ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta()
ktime_get() is CLOCK_MONOTONIC and not CLOCK_REALTIME.
> Since ktime_get() returns a time which is lesser than the expiry time,
> hrtimer_forward_now return 0.
> This in-turn causes timerfd_read to return 0.
> Is it not a bug?
It's a bug, but either a hardware or a BIOS bug and you are trying to paper
over it at the place where you observe the symptom, which is obviously the
wrong place because:
1) Any other time related function even in timerfd is affected as well
2) We do not cure symptoms, we cure the root cause. And clearly the root
cause hase not been explained and addressed.
Thanks,
tglx