Re: [PATCH] time: Avoid signed overflow in timekeeping_delta_to_ns()

From: John Stultz
Date: Tue Nov 15 2016 - 20:10:58 EST


On Tue, Nov 15, 2016 at 2:03 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> On Tue, Nov 15, 2016 at 1:53 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> On Mon, 14 Nov 2016, John Stultz wrote:
>>
>>> On Mon, Nov 14, 2016 at 11:42 AM, Chris Metcalf <cmetcalf@xxxxxxxxxxxx> wrote:
>>> > This bugfix was originally made in commit 35a4933a8959 ("time:
>>> > Avoid signed overflow in timekeeping_get_ns()"). When the code was
>>> > refactored in commit 6bd58f09e1d8 ("time: Add cycles to nanoseconds
>>> > translation") the signed overflow fix was lost. Re-introduce it.
>>> >
>>> > Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxxxx>
>>> > ---
>>> > I happened to be looking for an unrelated fix, found this code,
>>> > realized the tip code didn't match the fixed code, and
>>> > backtracked to where it had gone away.
>>> >
>>> > kernel/time/timekeeping.c | 3 +--
>>> > 1 file changed, 1 insertion(+), 2 deletions(-)
>>> >
>>> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>>> > index 37dec7e3db43..57926bc7b7f3 100644
>>> > --- a/kernel/time/timekeeping.c
>>> > +++ b/kernel/time/timekeeping.c
>>> > @@ -304,8 +304,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
>>> > {
>>> > s64 nsec;
>>> >
>>> > - nsec = delta * tkr->mult + tkr->xtime_nsec;
>>> > - nsec >>= tkr->shift;
>>> > + nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>>>
>>> Ugh.
>>>
>>> So... I think this proves the original fix was *far* too subtle to
>>> maintain. So I think reintroducing it as-is doesn't protect us from
>>> undoing it. If the problem is really using the signed intermediate
>>> nsec value, we should get rid of that.
>>
>> As I told the other guy who submitted something similar: This is not really
>> helpful. It merily drags the overflow case out by a factor of 2.
>
> Well... So lost time (where a VM/gdb caused stall runs past the
> clocksource or causes an mult overflow) is a bit less problematic then
> getting a huge negative nsec value.
>
>> So we really need to figure out under which circumstances this can happen
>> and fixup either the callsites or detect the condition right there, which
>> I'd like to avoid for the hotpath.
>
> I get that catching the (delta > TOOBIG) case, but even then I'm not
> sure how we deal that condition in a way that results in anything
> meaningfully different from the less-problematic unsigned overflow
> (ie, capping it).

So I think I'm going to queue up Liav's fix here, as it has been in my
TOQUEUE folder for a bit longer.

Thomas: I know you didn't like it when it was originally submitted,
preferring to catch the case when it happens, but the signed shift is
more problematic. Additionally, the CONFIG_DEBUG_TIMEKEEPING checks
should already warn on the next tick when this case triggers (when the
offset is larger then max_cycles).

Sound ok?

thanks
-john