Re: [PATCH v4 0/2] Detect stalls on guest vCPUS
From: Sebastian Ene
Date: Mon May 02 2022 - 02:03:48 EST
On Fri, Apr 29, 2022 at 03:25:45PM -0500, Rob Herring wrote:
> On Fri, Apr 29, 2022 at 08:30:29AM +0000, Sebastian Ene wrote:
> > This adds a mechanism to detect stalls on the guest vCPUS by creating a
> > per CPU hrtimer which periodically 'pets' the host backend driver.
> > On a conventional watchdog-core driver, the userspace is responsible for
> > delivering the 'pet' events by writing to the particular /dev/watchdogN node.
> > In this case we require a strong thread affinity to be able to
> > account for lost time on a per vCPU basis.
> >
> > This device driver acts as a soft lockup detector by relying on the host
> > backend driver to measure the elapesed time between subsequent 'pet' events.
> > If the elapsed time doesn't match an expected value, the backend driver
> > decides that the guest vCPU is locked and resets the guest. The host
> > backend driver takes into account the time that the guest is not
> > running. The communication with the backend driver is done through MMIO
> > and the register layout of the virtual watchdog is described as part of
> > the backend driver changes.
> >
> > The host backend driver is implemented as part of:
> > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> >
> > Changelog v4:
> > - rename the source from vm-wdt.c -> vm-watchdog.c
> > - convert all the error logging calls from pr_* to dev_* calls
> > - rename the DTS node "clock" to "clock-frequency"
Hi,
>
> Why do I have a v4 now when the discussion on v3 is not concluded. Give
> folks some time to respond. We're busy drinking from the firehose.
>
I am trying to address the issues incrementlly keeping a week cadence.
Any feedback on this is welcomed.
Thanks,
Seb
> Rob