Re: [PATCH] rcu/tree: consider time a VM was suspended

From: Paul E. McKenney
Date: Tue May 18 2021 - 19:15:18 EST


On Tue, May 18, 2021 at 10:41:21AM +0900, Sergey Senozhatsky wrote:
> On (21/05/17 09:23), Paul E. McKenney wrote:
> > On Sun, May 16, 2021 at 07:27:16PM +0900, Sergey Senozhatsky wrote:
> > > Soft watchdog timer function checks if a virtual machine
> > > was suspended and hence what looks like a lockup in fact
> > > is a false positive.
> > >
> > > This is what kvm_check_and_clear_guest_paused() does: it
> > > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> > > and if it's set then we need to touch all watchdogs and bail
> > > out.
> > >
> > > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> > > check works fine.
> > >
> > > There is, however, one more watchdog that runs from IRQ, so
> > > watchdog timer fn races with it, and that watchdog is not aware
> > > of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> > >
> > > apic_timer_interrupt()
> > > smp_apic_timer_interrupt()
> > > hrtimer_interrupt()
> > > __hrtimer_run_queues()
> > > tick_sched_timer()
> > > tick_sched_handle()
> > > update_process_times()
> > > rcu_sched_clock_irq()
> > >
> > > This triggers RCU stalls on our devices during VM resume.
> > >
> > > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> > > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> > > then there is nothing on this VCPU that touches watchdogs and
> > > RCU reads stale gp stall timestamp and new jiffies value, which
> > > makes it think that RCU has stalled.
> > >
> > > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> > > don't report RCU stalls when we resume the VM.
> >
> > Good point!
> >
> > But if I understand your patch correctly, if the virtual machine is
> > stopped at any point during a grace period, even if only for a short time,
> > stall warnings are suppressed for that grace period forever, courtesy of
> > the call to rcu_cpu_stall_reset(). So, if something really is stalling,
> > and the virtual machine just happens to stop for a few milliseconds, the
> > stall warning is completely suppressed. Which would make it difficult
> > to debug the underlying stall condition.
> >
> > Is it possible to provide RCU with information on the duration of the
> > virtual-machine stoppage so that RCU could adjust the timing of the
> > stall warning? Maybe by having something like rcu_cpu_stall_reset()
> > that takes the duration of the stoppage in jiffies?
>
> Good questions!
>
> And I think I've some bad news and some good news.
>
> As far as I can tell, none of the PVCLOCK_GUEST_STOPPED handlers take
> the stoppage duration into consideration. For instance, as soon as
> watchdog timer IRQ detects a potential softlockup it checks
> PVCLOCK_GUEST_STOPPED and touches all watchdogs, including RCU:
>
> watchdog_timer_fn()
> kvm_check_and_clear_guest_paused()
> pvclock_touch_watchdogs()
> rcu_cpu_stall_reset() // + the remaining watchdogs
>
> But things get more complex.
>
> pvclock_clocksource_read() also checks PVCLOCK_GUEST_STOPPED and calls
> pvclock_touch_watchdogs(). And this path is executed rather often.
>
> For instance,
>
> apic_timer_interrupt()
> smp_apic_timer_interrupt()
> hrtimer_interrupt()
> __hrtimer_run_queues()
> hrtimer_wakeup()
> try_to_wake_up()
> update_rq_clock()
> sched_clock_cpu()
> sched_clock()
> kvm_sched_clock_read()
> kvm_clock_read()
> pvclock_clocksource_read()
> pvclock_touch_watchdogs()
> rcu_cpu_stall_reset() // + the remaining watchdogs
>
> Or
>
> do_IRQ
> irq_exit
> sched_clock_cpu
> sched_clock
> kvm_sched_clock_read
> kvm_clock_read
> pvclock_clocksource_read
> pvclock_touch_watchdogs
> rcu_cpu_stall_reset() // + the remaining watchdogs
>
> And so on...
>
> You may wonder what are the good news then.
>
> Well. I'd say that my patch (is not beautiful but it) does not add
> a lot of additional or new damage. And it still fixes the valid race
> condition, as far as I'm concerned.

I have tentatively pulled it in for review and testing.

> I think we need to rework how pvclock_touch_watchdogs() does things
> internally, basically what you suggested, and this can be a separate
> effort.

In the shorter term... PVCLOCK_GUEST_STOPPED is mostly for things like
guest migration and debugger breakpoints, correct? Either way, I am
wondering if rcu_cpu_stall_reset() should take a lighter touch. Right
now, it effectively disables all stalls for the current grace period.
Why not make it restart the stall timeout when the stoppage is detected?

The strange thing is that unless something is updating the jiffies counter
to make it look like the system was up during the stoppage time interval,
there should be no reason to tell RCU anything. Is the jiffies counter
updated in this manner? (Not seeing it right offhand, but I don't claim
to be familiar with this code.)

Thanx, Paul