Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option

From: Boris Ostrovsky
Date: Wed Mar 27 2019 - 10:40:56 EST

On 3/27/19 6:00 AM, Ryan Thibodeaux wrote:
> On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote:
>> On 3/26/19 5:13 AM, Dario Faggioli wrote:
>>> On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
>>>> On 3/25/19 8:05 AM, luca abeni wrote:
>>>>> The picture shows the latencies measured with an unpatched guest
>>>>> kernel
>>>>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
>>>>> small
>>>>> value :).
>>>>> All the experiments have been performed booting the hypervisor with
>>>>> a
>>>>> small timer_slop (the hypervisor's one) value. So, they show that
>>>>> decreasing the hypervisor's timer_slop is not enough to measure low
>>>>> latencies with cyclictest.
>>>> I have a couple of questions:
>>>> * Does it make sense to make this a tunable for other clockevent
>>>> devices
>>>> as well?
>>> So, AFAIUI, the thing is as follows. In clockevents_program_event(), we
>>> keep the delta between now and the next timer event within
>>> dev->max_delta_ns and dev->min_delta_ns:
>>> delta = min(delta, (int64_t) dev->max_delta_ns);
>>> delta = max(delta, (int64_t) dev->min_delta_ns);
>>> For Xen (well, for the Xen clock) we have:
>>> .max_delta_ns = 0xffffffff,
>>> .min_delta_ns = TIMER_SLOP,
>>> which means a guest can't ask for a timer to fire earlier than 100us
>>> ahead, which is a bit too coarse, especially on contemporary hardware.
>>> For "lapic_deadline" (which was what was in use in KVM guests, in our
>>> experiments) we have:
>>> lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
>>> lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent);
>>> Which means max is 0x7FFFFF device ticks, and min is 0xF.
>>> clockevent_delta2ns() does the conversion from ticks to ns, basing on
>>> the results of the APIC calibration process. It calls cev_delta2ns()
>>> which does some scaling, shifting, divs, etc, and, at the very end,
>>> this:
>>> /* Deltas less than 1usec are pointless noise */
>>> return clc > 1000 ? clc : 1000;
>>> So, as Ryan is also saying, the actual minimum, in this case, depends
>>> on hardware, with a sanity check of "never below 1us" (which is quite
>>> smaller than 100us!)
>>> Of course, the actual granularity depends on hardware in the Xen case
>>> as well, but that is handled in Xen itself. And we have mechanisms in
>>> place in there to avoid timer interrupt storms (like, ahem, the Xen's
>>> 'timer_slop' boot parameter... :-P)
>>> And this is basically why I was also thinking we can/should lower the
>>> default value of TIMER_SLOP, here in the Xen clock implementation in
>>> Linux.
>> What do you think would be a sane value? 10us? Should we then still keep
>> this patch?
>> My concern would be that if we change the current value and it turns out
>> to be very wrong we'd then have no recourse.
>> -boris
> Speaking out of turn but as a participant in this thread, I would not
> assume to change the default value for all cases without significant
> testing by the community, touching a variety of configurations.
> It feels like changing the default has a non-trivial amount of
> unknowns that would need to be addressed.
> Not surprisingly, I am biased to the approach of my patch which
> does not change the default but offers flexibility to all.

If we are to change the default it would be good to at least collect
some data on distribution of delta values in
clockevents_program_event(). But as I said, I'd keep the patch.

Also, as far as the comment describing TIMER_SLOP, I agree that it is
rather misleading.

I can replace it with /* Minimum amount of time until next clock event
fires */, IÂ can do it while committing so no need to resend.