Re: [PATCH] KVM: arm/arm64: don't set vtimer->cnt_ctl in kvm_arch_timer_handler

From: Jia He
Date: Thu Dec 14 2017 - 00:35:47 EST


Hi


On 12/14/2017 12:57 PM, Jia He Wrote:
Hi Christoffer

I have tried your newer level-mapped-v7 branch, but bug is still there.

There is no special load in both host and guest. The guest (kernel 4.14) is often hanging when booting

the guest kernel log

[ OK ] Reached target Remote File Systems.
Starting File System Check on /dev/mapper/fedora-root...
[ OK ] Started File System Check on /dev/mapper/fedora-root.
Mounting /sysroot...
[ 2.670764] SGI XFS with ACLs, security attributes, no debug enabled
[ 2.678180] XFS (dm-0): Mounting V5 Filesystem
[ 2.740364] XFS (dm-0): Ending clean mount
[ OK ] Mounted /sysroot.
[ OK ] Reached target Initrd Root File System.
Starting Reload Configuration from the Real Root...
[ 61.288215] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 61.290791] 1-...!: (0 ticks this GP) idle=574/0/0 softirq=5/5 fqs=1
[ 61.293664] (detected by 0, t=6002 jiffies, g=-263, c=-264, q=39760)
[ 61.296480] Task dump for CPU 1:
[ 61.297938] swapper/1 R running task 0 0 1 0x00000020
[ 61.300643] Call trace:
[ 61.301260] __switch_to+0x6c/0x78
[ 61.302095] cpu_number+0x0/0x8
[ 61.302867] rcu_sched kthread starved for 6000 jiffies! g18446744073709551353 c18446744073709551352 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
[ 61.305941] rcu_sched I 0 8 2 0x00000020
[ 61.307250] Call trace:
[ 61.307854] __switch_to+0x6c/0x78
[ 61.308693] __schedule+0x268/0x8f0
[ 61.309545] schedule+0x2c/0x88
[ 61.310325] schedule_timeout+0x84/0x3b8
[ 61.311278] rcu_gp_kthread+0x4d4/0x7d8
[ 61.312213] kthread+0x134/0x138
[ 61.313001] ret_from_fork+0x10/0x1c

Maybe my previous patch is not perfect enough, thanks for your comments.

I digged it futher more, do you think below code logic is possibly problematic?


vtimer_save_stateÂÂÂÂÂÂÂÂÂÂ (vtimer->loaded = false, cntv_ctl is 0)

kvm_arch_timer_handlerÂÂÂÂÂÂÂÂ(read cntv_ctl and set vtimer->cnt_ctl = 0)

vtimer_restore_state     Â (write vtimer->cnt_ctl to cntv_ctl, then cntv_ctl will

ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂ Â Â be 0 forever)
sorry, adjust the format, make it easy for reading:

vtimer_save_stateÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (vtimer->loaded = false, cntv_ctl is 0)
kvm_arch_timer_handlerÂÂÂÂÂÂÂÂ(read cntv_ctl and set vtimer->cnt_ctl = 0)
vtimer_restore_state     Â (write vtimer->cnt_ctl to cntv_ctl, then cntv_ctl will be 0 forever)

--
Cheers,
Jia



If above analysis is reasonable, how about below patch? already tested in my arm64 server.

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index f9555b1..ee6dd3f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -99,7 +99,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
ÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂ vtimer = vcpu_vtimer(vcpu);

-ÂÂÂÂÂÂ if (!vtimer->irq.level) {
+ÂÂÂÂÂÂ if (vtimer->loaded && !vtimer->irq.level) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (kvm_timer_irq_can_fire(vtimer))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kvm_timer_update_irq(vcpu, true, vtimer);

Cheers,

Jia



On 12/13/2017 5:18 PM, Christoffer Dall Wrote:
On Tue, Dec 12, 2017 at 11:00:07PM -0800, Jia He wrote:
In our Armv8a server (qualcomm Amberwing, non VHE), after applying
Christoffer's timer optimizing patchset(Optimize arch timer register
handling), the guest is hang during kernel booting.

The error root cause might be as follows:
1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
cntv_ctl register value. And then it missed some cases to update timer's
irq (irq.level) when kvm_timer_irq_can_fire() is false
Why should it set the irq level to true when the timer cannot fire?

2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
ÂÂÂÂkvm_vcpu_check_block
ÂÂÂÂÂÂÂ kvm_cpu_has_pending_timer
ÂÂÂÂÂÂÂÂÂÂÂ kvm_timer_is_pending
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kvm_timer_should_fire
3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
poll process) and the guest is hang forever
This is just a polling loop which will expire after some time, so it
shouldn't halt the guest indefinitely, but merely slow it down for some
while, if we have a bug. Is that the behavior you're seeing or are you
seeing the guest coming to a complete halt?

Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
Signed-off-by: Jia He <jia.he@xxxxxxxxxxxxxxxx>
---
 virt/kvm/arm/arch_timer.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index f9555b1..bb86433 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
ÂÂÂÂÂ vtimer = vcpu_vtimer(vcpu);
 Â if (!vtimer->irq.level) {
-ÂÂÂÂÂÂÂ vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
This fix is clearly not correct, as it would prevent forwarding timer
interrupts in some cases.

ÂÂÂÂÂÂÂÂÂ if (kvm_timer_irq_can_fire(vtimer))
ÂÂÂÂÂÂÂÂÂÂÂÂÂ kvm_timer_update_irq(vcpu, true, vtimer);
ÂÂÂÂÂ }
--
2.7.4

I actually don't see how the above scenario you painted can happen.

If you're in the polling loop, that means that the timer state is loaded
on the vcpu, and that means you can take interrupts from the timer, and
when you take interrupts, you will set the irq.level.

And here's the first bit of logic in kvm_timer_is_pending():

ÂÂÂÂif (vtimer->irq.level || ptimer->irq.level)
ÂÂÂÂÂÂÂ return true;

So that would break the loop.

I'm not able to reproduce on my side with a non-VHE platform.

What is the workload you're running to reproduce this, and what is the
exact kernel tree and kernel configuration you're using?

Thanks,
-Christoffer