Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

From: Ryan Harper
Date: Tue Aug 30 2011 - 10:43:34 EST


* Marcelo Tosatti <mtosatti@xxxxxxxxxx> [2011-08-30 09:40]:
> On Tue, Aug 30, 2011 at 09:12:17AM -0500, Ryan Harper wrote:
> > * Marcelo Tosatti <mtosatti@xxxxxxxxxx> [2011-08-30 07:35]:
> > > On Mon, Aug 29, 2011 at 05:27:11PM -0600, Eric B Munson wrote:
> > > > Currently, when qemu stops a guest kernel that guest will issue a soft lockup
> > > > message when it resumes. This set provides the ability for qemu to comminucate
> > > > to the guest that it has been stopped. When the guest hits the watchdog on
> > > > resume it will check if it was suspended before issuing the warning.
> > > >
> > > > Eric B Munson (4):
> > > > Add flag to indicate that a vm was stopped by the host
> > > > Add functions to check if the host has stopped the vm
> > > > Add generic stubs for kvm stop check functions
> > > > Add check for suspended vm in softlockup detector
> > > >
> > > > arch/x86/include/asm/pvclock-abi.h | 1 +
> > > > arch/x86/include/asm/pvclock.h | 2 ++
> > > > arch/x86/kernel/kvmclock.c | 14 ++++++++++++++
> > > > include/asm-generic/pvclock.h | 14 ++++++++++++++
> > > > kernel/watchdog.c | 12 ++++++++++++
> > > > 5 files changed, 43 insertions(+), 0 deletions(-)
> > > > create mode 100644 include/asm-generic/pvclock.h
> > > >
> > > > --
> > > > 1.7.4.1
> > >
> > > How is the host supposed to set this flag?
> > >
> > > As mentioned previously, if you save save/restore the offset added to
> > > kvmclock on stop/cont (and the TSC MSR, forgot to mention that), no
> > > paravirt infrastructure is required. Which means the issue is also fixed
> > > for older guests.
> >
> > How is saving the offset going to prevent a large jump from triggering
> > the softlock message? Won't we still have not touched the watchdog for
> > that long period of time?
>
> The idea is to adjust the offset and the TSC value so that the guest
> does not notice the actual elapsed time. This is what happens when a
> guest is migrated or saved/restore, for example.

but that's within a rather short period of time... if we stop the guest
for a day, I don't think what you are suggesting can happen without time
being wrong in the guest, and as soon as it's corrected and we jump
forward larger than the watchdog timeout, won't we trigger it again?


--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@xxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/