Re: [RFC][PATCH 1/2] time: Add ktime_get_mono_raw_fast_ns()

From: John Stultz
Date: Wed Mar 18 2015 - 15:48:46 EST


On Tue, Mar 17, 2015 at 4:24 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Feb 20, 2015 at 11:49:49AM -0800, John Stultz wrote:
>> On Fri, Feb 20, 2015 at 6:29 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > @@ -489,7 +512,12 @@ static void timekeeping_update(struct ti
>> > memcpy(&shadow_timekeeper, &tk_core.timekeeper,
>> > sizeof(tk_core.timekeeper));
>> >
>> > - update_fast_timekeeper(&tk->tkr);
>> > + update_fast_timekeeper(&tk_fast_mono, &tk->tkr);
>> > +
>> > + tkr = tk->tkr;
>> > + tkr.mult = tk->tkr.clock->mult;
>> > + tkr.shift = tk->tkr.clock->shift;
>> > + update_fast_timekeeper(&tk_fast_mono_raw, &tkr);
>>
>> So this is sort of sneaky and subtle here, which will surely cause
>> problems later on. You're copying the original mult/shift pair into a
>> copy of the tkr, so you get similar results from timekeeping_get_ns()
>> as you'd want from timekeeping_get_ns_raw(). This results in multiple
>> ways of getting the raw clock.
>>
>> I think it would be better to either add a new tkr structure for the
>> raw clock in the timekeeper, so you can directly copy it over, or
>> extend the tkr structure so it can contain the raw values as well.
>>
>> Also, there's no real reason to have fast/non-fast versions of the raw
>> clock. Since it isn't affected by frequency changes, it can't have the
>> inconsistency issues the monotonic clock can see (which are documented
>> in the comment near ktime_get_mono_fast_ns()). So we can probably
>> condense these and avoid duplicative code.
>
> So I was looking at this again; and I think we do indeed need a second
> way to read mono_raw.
>
> So the immediate problem is the tk_core.seq loop around
> timekeeping_get_ns_raw(); if the NMI happens during the write side of
> that we're stuck.
>
> Now the reason we need that seqcount is because of tk->tkr.cycle_last,
> which, afaict, is the only bit that mono_raw needs that changes. With
> the possible exception of the suspend/resume paths; we also need to deal
> with NMIs requesting time during those.
>
> And we need this silly cycle_last business because of short clocks,
> which is something the generic code needs to deal with. And because we
> have a bit of an error margin on that we can indeed use a 'stale'
> cycle_last.

Sorry.. I'm feeling daft here, but this isn't making sense to me.

I'm suggesting we can use the same fast mode to replace the normal
one, since there's no frequency changes on the raw clock, so using a
stale cycle_last isn't problematic. For fast-wrapping clocks updating
cycle_last regualrly is important, but we should still be ok if we
grab the previous cycle_last mid-update.

In fact, looking at it, I'm starting to think we should even be
reusing the ktime_get_mono_fast_ns() (only under a conventional
seqlock read) for the normal ktime_get calls... Just so we can avoid
all this repetitive and slightly different logic.

> As to the above suggestions:
>
> - we cannot add another tkr structure to struct timekeeper because that
> would duplicate a number of fields, including the above mentioned
> cycle_last, and having two of those is bound to cause confusion; and,

I'm not sure if duplicating a few entries in the tkr structure is that
confusing compared to the subtle update trick you're using though...
Especially since we're duplicating it all in the fast_tk structures.
And since base_mono in the tkr could be just made to base, and would
allow us to drop the base_raw in the timekeeper. The
clock/read/mask/last would be duplicated, but if needed we could
rework the fast_tkr update to pull common elements from the timekeeper
if needed.

> - we cannot extend tkr because that would make it overflow the
> cacheline.
>
> So as it is, I do not see any other way than the already proposed patch;
> we even need the update from timekeeping_update() because we need an
> up-to-date cycle_last.
>
> So all I can really do is write a better changelog.

So.. there was still the bug w/ using tk->base_mono and
timekeeping_get_ns() rather then tk->base_raw and
timekeeping_get_ns_raw(), no? I get the tkr->mult/shift values are
faked so get_ns() is technically ok, but the base seems way wrong.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/