Re: [PATCH 1/3 V7] KVM, SEV: Add support for SEV intra host migration

From: Sean Christopherson
Date: Thu Sep 09 2021 - 21:40:12 EST


On Thu, Sep 09, 2021, Marc Orr wrote:
> > > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> > > +{
> > > + struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > > + struct file *source_kvm_file;
> > > + struct kvm *source_kvm;
> > > + int ret;
> > > +
> > > + ret = svm_sev_lock_for_migration(kvm);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> > > + ret = -EINVAL;
> > > + pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
> >
> > Linux generally doesn't log user errors to dmesg. They can be helpful during
> > development, but aren't actionable and thus are of limited use in production.
>
> Ha. I had suggested adding the logs when I reviewed these patches
> (maybe before Peter posted them publicly). My rationale is that if I'm
> looking at a crash in production, and all I have is a stack trace and
> the error code, then I can narrow the failure down to this function,
> but once the function starts returning the same error code in multiple
> places now it's non-trivial for me to deduce exactly which condition
> caused the crash. Having these logs makes it trivial. However, if this
> is not the preferred Linux style then so be it.

I don't necessarily disagree, but none of these errors conditions should so much
as sniff production. E.g. if userspace invokes this on a !KVM fd or on a non-SEV
source, or before guest_state_protected=true, then userspace has bigger problems.
Ditto if the dest isn't actual KVM VM or doesn't meet whatever SEV-enabled/disabled
criteria we end up with.

The mismatch in online_vcpus is the only one where I could reasonablly see a bug
escaping to production, e.g. due to an orchestration layer mixup.

For all of these conditions, userspace _must_ be aware of the conditions for success,
and except for guest_state_protected=true, userspace has access to what state it
sent into KVM, e.g. it shouldn't be difficult for userspace dump the relevant bits
from the src and dst without any help from the kernel.

If userspace really needs kernel help to differentiate what's up, I'd rather use
more unique errors for online_cpus and guest_state_protected, e.g. -E2BIG isn't
too big of a strecth for the online_cpus mismatch.