Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy

From: Paolo Bonzini
Date: Thu Feb 25 2016 - 08:38:25 EST

On 19/02/2016 15:44, Radim KrÄmÃÅ wrote:
>> The resulting injections are:
>>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.

I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew"
policy. So we can change QEMU's kvm-i8254 to accept "slew" and warn if
"delay" is given.

In fact "slew" means "a large number or quantity of something" and
indeed that's a good word to characterize kvm-i8254's reinjection behavior.

>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
>>> For delay I *think* it would be 0, 42, 62, 82, 102.
> I could call this "delay".
> Continue to deliver ticks at the normal rate. The guest time will be
> delayed due to the late tick
> At 82 time units, the guest thinks it's 60, so the guest will do
> everything late. (Leading us to call it delayed?!)

Yup, this was my reasoning.

> Few examples of "delay" that I find easier to accept:
> 0, 60, 80.

This is "discard".

> 0, 42, 60, 80. Because we haven't missed the tick at 20, it just took
> a while to be delivered. (Semantics ...)

This is not discard. The ideal implementation of the tick policies is
that the timer devices enjoy information from the interrupt controller,
that lets them know when a tick cannot be delivered. In that case they
do stuff like:

- save it for later (catchup)

- drop it for good (which is discard, and not the same as stashing it in

- pause the timer (delay)

- coalesce the ticks into one late tick (merge)

> Terminlogy does suck. (Maybe it stems from the fact that QEMU talks
> about lost ticks, but libvirt about ticks?)
> Nevertheless, I don't think that libvirt "merge" covers what PIT does in
> KVM or real hardware.
> Merge the missed tick(s) into one tick and inject. The guest time may
> be delayed, depending on how the OS reacts to the merging of ticks
> No merging is happening in KVM or real hardware: every tick is exactly
> one tick, so the guest cannot tell that we missed some ticks and the
> time is delayed. If a tick made it into clear IRR, it's not missed.

The libvirt documentation is written from the point of view of the
guest, ignoring whether the late tick is recorded in some guest-visible
register (IRR) or in the processor state (as is the case for NMI) or in
the timer device state or who knows where.

Therefore, it _also_ happens that thanks to IRR and NMI latching you can
implement "merge" without having that kind of relationship between the
timer device and the interrupt controller. But libvirt doesn't care,
all the <timer> element wants is the guest-visible behavior in terms of
when the ticks are delivered.

This was my reasoning when proposing to change QEMU regarding "discard"
as well:

- RTC would warn for "discard" and accept "merge"

- kvm-i8254 would warn for "discard", and accept "merge" if the
capability says that your patch is in. The idea is that "discard" is
such a bad idea, that "merge" should fail if the hypervisor would cause
the watchdog to hang.

> In the example:
>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
> at 80, the guest thinks it's 60.
> I think that merge might do: 0, 42, 60, 80.
> But the tick at 42 is counted as two ticks (20, 40) in the guest.

Yes, merge is a good policy if the guest can somehow realize that 42
stood for (20, 40). Discard would be a good policy too if the guest can
somehow realize that 60 stood for (20, 40, 60) but it has the problem
that NMIs don't do EOIs.

> The main problem of this interpretation is that discard is a subset of
> merge:
>>> - for discard: 0, 60, 80.
> The tick at 60 has to be counted as three ticks (20, 40, 60).

Why is it a problem?