Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

From: Sean Christopherson
Date: Wed Feb 19 2020 - 11:18:30 EST


On Wed, Feb 19, 2020 at 12:19:01PM +0100, Erwan Velu wrote:
> On 18/02/2020 19:48, Sean Christopherson wrote:
> [...]
> >Fix userspace to only do the "add" on one CPU.
> >
> >Changing kvm_arch_init() to use pr_err_once() for the disabled_by_bios()
> >case "works", but it's effectively a hack to workaround a flawed userspace.
>
> I'll see with the user space tool to sort this out.
>
> I'm also considering how "wrong" is what they do: udevadm trigger is
> generating 3500  "uevent add" on my system and I only noticed kvm to print
> this noisy message.
>
> So as the print once isn't that "wrong" neither, this simple patch would
> avoid polluting the kernel logs.
>
>
> So my proposal would be
>
> - have this simple patch on the kernel side to avoid having userspace apps
> polluting logs
>
> - contacting the udev people to see the rational & fix it too : I'll do that
>
>
> As you said, once probed, there is no need reprinting the same message again
> as the situation cannot have changed.

For this exact scenario, on Intel/VMX, this is mostly true. But, the MSR
check for AMD/SVM has a disable bit that takes effect irrespective of the
MSR's locked bit, i.e. SVM could theoretically change state without any
super special behavior.

Even on Intel, the state can potentially change, especially on a system
with a misbehaving BIOS. FEATURE_CONTROL is cleared on CPU RESET, e.g. VMX
enabling could change if BIOS "forgets" to reinitialize the MSR upon waking
from S3 (suspend). Things get really weird if we consider the case where
BIOS leaves the MSR unlocked after S3, the user manually writes the MSR,
and then it gets cleared again on a different S3 transition.

SVM is even more sensitive because VM_CR is cleared on INIT, not just RESET.

> As we are on the preliminary return code path (to make a EOPNOTSUPP), I
> would vote for protecting the print against multiple calls/prints.
>
> The kernel patch isn't impacting anyone (in a regular case) and just avoid
> pollution.
>
> Would you agree on that ?

Sadly, no. Don't get me wrong, I completely agree that, ideally, KVM would
not spam the log, even when presented with a misbehaving userspace.

My hesitation about changing the error message to pr_err_once() isn't so
much about right versus wrong as it is about creating misleading and
potentially confusing code in KVM, and setting a precedent that I don't
think we want to carry forward.

E.g. the _once() doesn't hold true if module kvm is unloaded and other
error messages such as basic CPU support would still be unlimited. The
basic CPU support message definitely should *not* be _once() as that would
squash messages when loading the wrong vendor module.

As for setting a precedent, moving the error message to the vendor module
or making kvm a monolithic module would "break" the _once() behavior.

And, the current systemd behavior is actually quite dangerous, e.g. on a
misconfigured system where SVM is enabled on a subset of CPUs, probing KVM
on every CPU essentially guarantees that KVM will be loaded on a broken
system. In that case, I think we actually want the spam. Note, as of
kernel 5.6, this doesn't apply to VMX as kvm_intel will no longer load on a
misconfigured system since FEATURE_CONTROL configuration is incorporated
into the per-CPU checks.

All of that being said, what about converting all of the error messages to
pr_err_ratelimited()? That would take the edge off this particular problem,
wouldn't create incosistencies between error messages, and won't completely
squash error messages in corner case scenarios on misconfigured systems.