On Thu, 13 Jul 2023 10:50:36 +0800
Wang Jianchao <jianchwa@xxxxxxxxxxx> wrote:
On 2023.07.13 02:14, Zhi Wang wrote:
On Fri, 7 Jul 2023 14:17:58 +0800
Wang Jianchao <jianchwa@xxxxxxxxxxx> wrote:
Hi
This patchset attemps to introduce a new pv feature, lazy tscdeadline.
Everytime guest write msr of MSR_IA32_TSC_DEADLINE, a vm-exit occurs
and host side handle it. However, a lot of the vm-exit is unnecessary
because the timer is often over-written before it expires.
v : write to msr of tsc deadline
| : timer armed by tsc deadline
v v v v v | | | | |
---------------------------------------> Time
The timer armed by msr write is over-written before expires and the
vm-exit caused by it are wasted. The lazy tscdeadline works as following,
v v v v v | |
---------------------------------------> Time
'- arm -'
Interesting patch.
I am a little bit confused of the chart above. It seems the write of MSR,
which is said to cause VM exit, is not reduced in the chart of lazy
tscdeadline, only the times of arm are getting less. And the benefit of
lazy tscdeadline is said coming from "less vm exit". Maybe it is better
to imporve the chart a little bit to help people jump into the idea
easily?
Thanks so much for you comment and sorry for my poor chart.
You don't have to say sorry here. :) Save it for later when you actually
break something.
Let me try to rework the chart.
Before this patch, every time guest start or modify a hrtimer, we need to write the msr of tsc deadline,
a vm-exit occurs and host arms a hv or sw timer for it.
w: write msr
x: vm-exit
t: hv or sw timer
Guest
w
---------------------------------------> Time
Host x t
However, in some workload that needs setup timer frequently, msr of tscdeadline is usually overwritten
many times before the timer expires. And every time we modify the tscdeadline, a vm-exit ocurrs
1. write to msr with t0
Guest
w0
----------------------------------------> Time
Host x0 t0
2. write to msr with t1
Guest
w1
------------------------------------------> Time
Host x1 t0->t1
2. write to msr with t2
Guest
w2
------------------------------------------> Time
Host x2 t1->t2
3. write to msr with t3
Guest
w3
------------------------------------------> Time
Host x3 t2->t3
What this patch want to do is to eliminate the vm-exit of x1 x2 and x3 as following,
Firstly, we have two fields shared between guest and host as other pv features, saying,
- armed, the value of tscdeadline that has a timer in host side, only updated by __host__ side
- pending, the next value of tscdeadline, only updated by __guest__ side
1. write to msr with t0
armed : t0
pending : t0
Guest
w0
----------------------------------------> Time
Host x0 t0
vm-exit occurs and arms a timer for t0 in host side
2. write to msr with t1
armed : t0
pending : t1
Guest
w1
------------------------------------------> Time
Host t0
the value of tsc deadline that has been armed, namely t0, is smaller than t1, needn't to write
to msr but just update pending
3. write to msr with t2
armed : t0
pending : t2
Guest
w2
------------------------------------------> Time
Host t0
Similar with step 2, just update pending field with t2, no vm-exit
4. write to msr with t3
armed : t0
pending : t3
Guest
w3
------------------------------------------> Time
Host t0
Similar with step 2, just update pending field with t3, no vm-exit
5. t0 expires, arm t3
armed : t3
pending : t3
Guest
------------------------------------------> Time
Host t0 ------> t3
t0 is fired, it checks the pending field and re-arm a timer based on it.
Here is the core ideal of this patch ;)
That's much better. Please keep this in the cover letter in the next RFC.
My concern about this approach is: it might slightly affect timing
sensitive workload in the guest, as the approach merges the deadline
interrupt. The guest might see less deadline interrupts than before. It
might be better to have a comparison of number of deadline interrupts
in the cover letter.