Re: [PATCH] rcu/tree: consider time a VM was suspended

From: Sergey Senozhatsky
Date: Thu May 20 2021 - 18:34:51 EST


On (21/05/20 07:57), Paul E. McKenney wrote:
> >
> > Sounds good. I can cook a patch and run some tests.
> > Or do you want to send a patch?
>
> Given that you have the test setup, things might go faster if you do
> the patch, especially taking timezones into consideration. Of course,
> if you run into difficulties, you know where to find me.

OK. Sounds good to me.

> > While VCPU-2 has PVCLOCK_GUEST_STOPPED set (resuming) and is in
> > check_cpu_stall(), the VCPU-3 is executing:
> >
> > apic_timer_interrupt()
> > tick_irq_enter()
> > tick_do_update_jiffies64()
> > do_timer()
>
> OK, but the normal grace period time is way less than one second, and
> the stall timeout in mainline is 21 seconds, so that would be a -lot-
> of jiffies of skew. Or does the restarting really take that long a time?

That's a good question. I see huge jiffies spike in the logs.
I suspect that resuming a VM can take some time, especially on a "not
powerful at all" overcommitted host (more virtual CPUs than physical
ones).