Re: [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion

From: David Gibson
Date: Thu Dec 08 2016 - 18:47:49 EST

On Thu, Dec 08, 2016 at 08:49:32PM -0000, Thomas Gleixner wrote:
> The clocksource delta to nanoseconds conversion is using signed math, but
> the delta is unsigned. This makes the conversion space smaller than
> necessary and in case of a multiplication overflow the conversion can
> become negative. The conversion is done with scaled math:
> s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
> Shifting a signed integer right obvioulsy preserves the sign, which has
> interesting consequences:
> - Time jumps backwards
> - __iter_div_u64_rem() which is used in one of the calling code pathes
> will take forever to piecewise calculate the seconds/nanoseconds part.
> This has been reported by several people with different scenarios:
> David observed that when stopping a VM with a debugger:
> "It was essentially the stopped by debugger case. I forget exactly why,
> but the guest was being explicitly stopped from outside, it wasn't just
> scheduling lag. I think it was something in the vicinity of 10 minutes
> stopped."
> When lifting the stop the machine went dead.
> The stopped by debugger case is not really interesting, but nevertheless it
> would be a good thing not to die completely.
> But this was also observed on a live system by Liav:
> "When the OS is too overloaded, delta will get a high enough value for the
> msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
> after the shift the nsec variable will gain a value similar to
> 0xffffffffff000000."
> Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
> ("time: Add cycles to nanoseconds translation"). It had been fixed a year
> ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
> timekeeping_get_ns()").
> Though it's not surprising that the issue has been reintroduced because the
> function itself and the whole call chain uses s64 for the result and the
> propagation of it. The change in this recent commit is subtle:
> s64 nsec;
> - nsec = (d * m + n) >> s:
> + nsec = d * m + n;
> + nsec >>= s;
> d being type of cycles_t adds another level of obfuscation.
> This wouldn't have happened if the previous change to unsigned computation
> would have made the 'nsec' variable u64 right away and a follow up patch
> had cleaned up the whole call chain.
> There have been patches submitted which basically did a revert of the above
> patch leaving everything else unchanged as signed. Back to square one. This
> spawned a admittedly pointless discussion about potential users which rely
> on the unsigned behaviour until someone pointed out that it had been fixed
> before. The changelogs of said patches added further confusion as they made
> finally false claims about the consequences for eventual users which expect
> signed results.
> Despite delta being cycles_t, aka. u64, it's very well possible to hand in
> a signed negative value and the signed computation will happily return the
> correct result. But nobody actually sat down and analyzed the code which
> was added as user after the propably unintended signed conversion.
> Though in sensitive code like this it's better to analyze it proper and
> make sure that nothing relies on this than hunting the subtle wreckage half
> a year later. After analyzing all call chains it stands that no caller can
> hand in a negative value (which actually would work due to the s64 cast)
> and rely on the signed math to do the right thing.
> Change the conversion function to unsigned math. The conversion of all call
> chains is done in a follow up patch.
> This solves the starvation issue, which was caused by the negative result,
> but it does not solve the underlying problem. It merily procrastinates
> it. When the timekeeper update is deferred long enough that the unsigned
> multiplication overflows, then time going backwards is observable again.
> It does neither solve the issue of clocksources with a small counter width
> which will wrap around possibly several times and cause random time stamps
> to be generated. But those are usually not found on systems used for
> virtualization, so this is likely a non issue.
> I took the liberty to claim authorship for this simply because
> analyzing all callsites and writing the changelog took substantially
> more time than just making the simple s/s64/u64/ change and ignore the
> rest.
> Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
> Reported-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>
> Reported-by: Liav Rehana <liavr@xxxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Reviewed-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>

> ---
> kernel/time/timekeeping.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = defaul
> static inline u32 arch_gettimeoffset(void) { return 0; }
> #endif
> -static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> cycle_t delta)
> {
> - s64 nsec;
> + u64 nsec;
> nsec = delta * tkr->mult + tkr->xtime_nsec;
> nsec >>= tkr->shift;

David Gibson | I'll have my music baroque, and my code
david AT | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!

Attachment: signature.asc
Description: PGP signature