Re: [PATCH] rcu/tree: consider time a VM was suspended

From: Sergey Senozhatsky
Date: Mon May 17 2021 - 21:41:30 EST


On (21/05/17 09:23), Paul E. McKenney wrote:
> On Sun, May 16, 2021 at 07:27:16PM +0900, Sergey Senozhatsky wrote:
> > Soft watchdog timer function checks if a virtual machine
> > was suspended and hence what looks like a lockup in fact
> > is a false positive.
> >
> > This is what kvm_check_and_clear_guest_paused() does: it
> > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> > and if it's set then we need to touch all watchdogs and bail
> > out.
> >
> > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> > check works fine.
> >
> > There is, however, one more watchdog that runs from IRQ, so
> > watchdog timer fn races with it, and that watchdog is not aware
> > of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> >
> > apic_timer_interrupt()
> > smp_apic_timer_interrupt()
> > hrtimer_interrupt()
> > __hrtimer_run_queues()
> > tick_sched_timer()
> > tick_sched_handle()
> > update_process_times()
> > rcu_sched_clock_irq()
> >
> > This triggers RCU stalls on our devices during VM resume.
> >
> > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> > then there is nothing on this VCPU that touches watchdogs and
> > RCU reads stale gp stall timestamp and new jiffies value, which
> > makes it think that RCU has stalled.
> >
> > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> > don't report RCU stalls when we resume the VM.
>
> Good point!
>
> But if I understand your patch correctly, if the virtual machine is
> stopped at any point during a grace period, even if only for a short time,
> stall warnings are suppressed for that grace period forever, courtesy of
> the call to rcu_cpu_stall_reset(). So, if something really is stalling,
> and the virtual machine just happens to stop for a few milliseconds, the
> stall warning is completely suppressed. Which would make it difficult
> to debug the underlying stall condition.
>
> Is it possible to provide RCU with information on the duration of the
> virtual-machine stoppage so that RCU could adjust the timing of the
> stall warning? Maybe by having something like rcu_cpu_stall_reset()
> that takes the duration of the stoppage in jiffies?

Good questions!

And I think I've some bad news and some good news.

As far as I can tell, none of the PVCLOCK_GUEST_STOPPED handlers take
the stoppage duration into consideration. For instance, as soon as
watchdog timer IRQ detects a potential softlockup it checks
PVCLOCK_GUEST_STOPPED and touches all watchdogs, including RCU:

watchdog_timer_fn()
kvm_check_and_clear_guest_paused()
pvclock_touch_watchdogs()
rcu_cpu_stall_reset() // + the remaining watchdogs

But things get more complex.

pvclock_clocksource_read() also checks PVCLOCK_GUEST_STOPPED and calls
pvclock_touch_watchdogs(). And this path is executed rather often.

For instance,

apic_timer_interrupt()
smp_apic_timer_interrupt()
hrtimer_interrupt()
__hrtimer_run_queues()
hrtimer_wakeup()
try_to_wake_up()
update_rq_clock()
sched_clock_cpu()
sched_clock()
kvm_sched_clock_read()
kvm_clock_read()
pvclock_clocksource_read()
pvclock_touch_watchdogs()
rcu_cpu_stall_reset() // + the remaining watchdogs

Or

do_IRQ
irq_exit
sched_clock_cpu
sched_clock
kvm_sched_clock_read
kvm_clock_read
pvclock_clocksource_read
pvclock_touch_watchdogs
rcu_cpu_stall_reset() // + the remaining watchdogs

And so on...

You may wonder what are the good news then.

Well. I'd say that my patch (is not beautiful but it) does not add
a lot of additional or new damage. And it still fixes the valid race
condition, as far as I'm concerned.

I think we need to rework how pvclock_touch_watchdogs() does things
internally, basically what you suggested, and this can be a separate
effort.