Re: [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection

From: Paul E. McKenney
Date: Thu Jul 15 2021 - 09:32:48 EST


On Thu, Jul 15, 2021 at 06:09:45PM +0900, Sergey Senozhatsky wrote:
> On (21/05/22 00:56), Sergey Senozhatsky wrote:
> > Soft watchdog timer function checks if a virtual machine
> > was suspended and hence what looks like a lockup in fact
> > is a false positive.
> >
> > This is what kvm_check_and_clear_guest_paused() does: it
> > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> > and if it's set then we need to touch all watchdogs and bail
> > out.
> >
> > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> > check works fine.
> >
> > There is, however, one more watchdog that runs from IRQ, so
> > watchdog timer fn races with it, and that watchdog is not aware
> > of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> >
> > apic_timer_interrupt()
> > smp_apic_timer_interrupt()
> > hrtimer_interrupt()
> > __hrtimer_run_queues()
> > tick_sched_timer()
> > tick_sched_handle()
> > update_process_times()
> > rcu_sched_clock_irq()
> >
> > This triggers RCU stalls on our devices during VM resume.
> >
> > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> > then there is nothing on this VCPU that touches watchdogs and
> > RCU reads stale gp stall timestamp and new jiffies value, which
> > makes it think that RCU has stalled.
> >
> > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> > don't report RCU stalls when we resume the VM.
>
> Hello Paul,
>
> I've noticed that this patch set didn't make it to Linus's tree.
> Was it intentional?

This patch (and the 18 preceding it) didn't make the cutoff for the
just-past merge window. If this patch is urgent, please let me know
and I can push it, with luck by the end of next week.

If that one is urgent, are these two also?

817690fd18af ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()")
9ed9bf0d17cd ("rcu: Start timing stall repetitions after warning complete")

If so, it is better to handle them as a group than separately.

The cutoff for a given merge window is normally shortly after the close
of the previous merge window. This time, I am a bit slow creating
branches, but the cutoff for the v5.15 merge window should be by the
end of the week. This is a bit more lag than most subsystems, but
this is after all RCU.

As always, if a given commit is urgent, please let me know and I
will see what I can do to fast-track it.

For reference:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html

Again, if this one needs to hit mainline before the v5.15 merge
window, please let me know.

Thanx, Paul