Re: [PATCH v11 2/2] misc: Add a mechanism to detect stalls on guest vCPUs

From: Sebastian Ene
Date: Fri Jul 08 2022 - 10:01:54 EST


On Fri, Jul 08, 2022 at 03:47:23PM +0200, Greg Kroah-Hartman wrote:
> On Fri, Jul 08, 2022 at 11:23:45AM +0000, Sebastian Ene wrote:
> > This driver creates per-cpu hrtimers which are required to do the
> > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > userspace is responsible for delivering the 'pet' events by writing to
> > the particular /dev/watchdogN node. In this case we require a strong
> > thread affinity to be able to account for lost time on a per vCPU.
> >
> > This part of the driver is the 'frontend' which is reponsible for
> > delivering the periodic 'pet' events, configuring the virtual peripheral
> > and listening for cpu hotplug events. The other part of the driver is
> > an emulated MMIO device which is part of the KVM virtual machine
> > monitor and this part accounts for lost time by looking at the
> > /proc/{}/task/{}/stat entries.
> >
> > Reviewed-by: Will Deacon <will@xxxxxxxxxx>
> > Signed-off-by: Sebastian Ene <sebastianene@xxxxxxxxxx>
> > ---
> > drivers/misc/Kconfig | 14 ++
> > drivers/misc/Makefile | 1 +
> > drivers/misc/vcpu_stall_detector.c | 223 +++++++++++++++++++++++++++++
> > 3 files changed, 238 insertions(+)
> > create mode 100644 drivers/misc/vcpu_stall_detector.c
> >
> > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > index 41d2bb0ae23a..d5b7610459f7 100644
> > --- a/drivers/misc/Kconfig
> > +++ b/drivers/misc/Kconfig
> > @@ -483,6 +483,20 @@ config OPEN_DICE
> >
> > If unsure, say N.
> >
> > +config VCPU_STALL_DETECTOR
> > + tristate "Guest vCPU stall detector"
> > + select LOCKUP_DETECTOR

Hi Greh,

>
> This should be a "depends on", not a select, right? This got enabled on
> my build when I didn't want it to, and trying to track down why it was
> enabled would be a pain for people.

Thanks for noticing it ! I think we can completely remove this
because it was needed in (v9) for the `watchdog_cpumask` and currently
we are not using it anymore.

>
> thanks,
>
> greg k-h

Thanks,
Seb