Re: [PATCH 5/9] KVM: x86/mmu: Convert "runtime" WARN_ON() assertions to WARN_ON_ONCE()

From: Sean Christopherson
Date: Fri May 12 2023 - 19:19:00 EST

On Fri, May 12, 2023, David Matlack wrote:
> On Thu, May 11, 2023 at 04:59:13PM -0700, Sean Christopherson wrote:
> > Convert all "runtime" assertions, i.e. assertions that can be triggered
> > while running vCPUs, from WARN_ON() to WARN_ON_ONCE(). Every WARN in the
> > MMU that is tied to running vCPUs, i.e. not contained to loading and
> > initializing KVM, is likely to fire _a lot_ when it does trigger. E.g. if
> > KVM ends up with a bug that causes a root to be invalidated before the
> > page fault handler is invoked, pretty much _every_ page fault VM-Exit
> > triggers the WARN.
> >
> > If a WARN is triggered frequently, the resulting spam usually causes a lot
> > of damage of its own, e.g. consumes resources to log the WARN and pollutes
> > the kernel log, often to the point where other useful information can be
> > lost. In many case, the damage caused by the spam is actually worse than
> > the bug itself, e.g. KVM can almost always recover from an unexpectedly
> > invalid root.
> >
> > On the flip side, warning every time is rarely helpful for debug and
> > triage, i.e. a single splat is usually sufficient to point a debugger in
> > the right direction, and automated testing, e.g. syzkaller, typically runs
> > with warn_on_panic=1, i.e. will never get past the first WARN anyways.
> On the topic of syzkaller, we should get them to test with
> CONFIG_KVM_PROVE_MMU once it's available.


> > Lastly, when an assertions fails multiple times, the stack traces in KVM
> > are almost always identical, i.e. the full splat only needs to be captured
> > once. And _if_ there is value in captruing information about the failed
> > assert, a ratelimited printk() is sufficient and less likely to rack up a
> > large amount of collateral damage.
> These are all good arguments and I think they apply to KVM_MMU_WARN_ON()
> as well. Should we convert that to _ONCE() too?

Already done in this patch :-) I didn't call it out because that warn also falls
under the "runtime assertions" umbrella.

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index bb1649669bc9..cfe925fefa68 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -9,7 +9,7 @@
#undef MMU_DEBUG

#ifdef MMU_DEBUG
-#define KVM_MMU_WARN_ON(x) WARN_ON(x)
#define KVM_MMU_WARN_ON(x) do { } while (0)