Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage

From: Dongli Zhang
Date: Mon Mar 04 2019 - 21:16:17 EST

Hi Thomas,

On 3/2/19 7:43 AM, Thomas Gleixner wrote:
> On Thu, 28 Feb 2019, Dongli Zhang wrote:
>> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
>> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
> Errm. No. The root cause is that jiffies_to_usecs() is used for that in the
> first place. The function has been that way forever and all usage sites
> (except a broken dev_debug print in infiniband) feed delta values. Yes, it
> could have documentation....

Thank you very much for the explanation. It would help the developers clarify
the usage of jiffies_to_usecs() (which we should always feed with dealt value)
with comments above it.

Indeed, the input value in this bug is also a delta value. Because of the
special mechanisms used by xen to account steal clock, the initial delta value
is always very large, only when the new cpu is added after the VM is already up
for very long time.

Dongli Zhang

>> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
>> If guest is already up for long time, the initial steal time for new vcpu might
>> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
>> So far, I have two solutions:
>> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
>> link and I am afraid it would bring side effect. The return type in latest
>> mainline kernel is still 'unsigned int'.
> Changing it to unsigned long would just solve the issue for 64bit.
> Thanks,
> tglx