Re: [PATCH] timekeeping: move multigrain ctime floor handling into timekeeper

From: Jeff Layton
Date: Thu Sep 12 2024 - 07:34:54 EST


On Thu, 2024-09-12 at 10:01 +0000, Arnd Bergmann wrote:
> On Wed, Sep 11, 2024, at 20:43, Jeff Layton wrote:
> >
> > I think we'd have to track this delta as an atomic value and cmpxchg
> > new values into place. The zeroing seems quite tricky to make race-
> > free.
> >
> > Currently, we fetch the floor value early in the process and if it
> > changes before we can swap a new one into place, we just take whatever
> > the new value is (since it's just as good). Since these are monotonic
> > values, any new value is still newer than the original one, so its
> > fine. I'm not sure that still works if we're dealing with a delta that
> > is siding upward and downward.
> >
> > Maybe it does though. I'll take a stab at this tomorrow and see how it
> > looks.
>
> Right, the only idea I had for this would be to atomically
> update a 64-bit tuple of the 32-bit sequence count and the
> 32-bit delta value in the timerkeeper. That way I think the
> "coarse" reader would still get a correct value when running
> concurrently with both a fine-grained reader updating the count
> and the timer tick setting a new count.
>
> There are still a couple of problems:
>
> - this extends the timekeeper logic beyond what the seqlock
> semantics normally allow, and I can't prove that this actually
> works in all corner cases.
>
> - if the delta doesn't fit in a 32-bit value, there has to
> be another fallback mechanism.
>

That could be a problem. I was hoping the delta couldn't grow that
large between timer ticks, but I guess it can. I guess the fallback
could be to just grab new fine-grained timestamps on each call until
the timer ticks.

> - This still requires an atomic64_cmpxchg() in the
> fine-grained ktime_get_real_ts64() replacement, which
> I think is what inode_set_ctime_current() needs today
> as well to ensure that the next coarse value is the
> highest one that has been read so far.
>

Yes. We really don't want to take the seqlock for write just to update
timestamps. I'd prefer to keep the floor-handling lock-free if
possible.

> There is another idea that would completely replace
> your design with something /much/ simpler:
>
> - add a variant of ktime_get_real_ts64() that just
> sets a flag in the timekeeper to signify that a
> fine-grained time has been read since the last
> timer tick
> - add a variant of ktime_get_coarse_real_ts64()
> that returns either tk_xtime() if the flag is
> clear or calls ktime_get_real_ts64() if it's set
> - reset the flag in timekeeping_advance() and any other
> place that updates tk_xtime
>
> That way you avoid the atomic64_try_cmpxchg()
> inode_set_ctime_current(), making that case faster,
> and avoid all overhead in coarse_ctime() unless you
> use both types during the same tick.
>

With the current code we only get a fine grained timestamp iff:

1/ the timestamps have been queried (a'la I_CTIME_QUERIED)
2/ the current coarse-grained or floor time would not show a change in
the ctime

If we do what you're suggesting above, as soon as one task sets the
flag, anyone calling current_time() will end up getting a brand new
fine-grained timestamp, even when the current floor time would have
been fine.

That means a lot more calls into ktime_get_real_ts64(), at least until
the timer ticks, and would probably mean a lot of extra journal
transactions, since those timestamps would all be distinct from one
another and would need to go to disk more often.
--
Jeff Layton <jlayton@xxxxxxxxxx>