Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs

From: Greg Kurz
Date: Fri Apr 22 2016 - 05:25:58 EST


Hi Radim !

On Thu, 21 Apr 2016 19:36:11 +0200
Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> wrote:

> 2016-04-21 18:45+0200, Greg Kurz:
> > On Thu, 21 Apr 2016 18:00:19 +0200
> > Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> wrote:
> >> 2016-04-21 16:20+0200, Greg Kurz:
> >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> >> > introduced a check to prevent potential kernel memory corruption in case
> >> > the vcpu id is too great.
> >> >
> >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> >> >
> >> > This means the check does not belong here and should be moved to some arch
> >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> >> >
> >> > ARM and s390 already have such a check.
> >> >
> >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> >> > id is used as described in the above commit: I believe PowerPC can live
> >> > without this check.
> >> >
> >> > In the end, this patch simply moves the check to MIPS and x86. While here,
> >> > we also update the documentation to dissociate vcpu ids from the maximum
> >> > number of vcpus per virtual machine.
> >> >
> >> > Acked-by: James Hogan <james.hogan@xxxxxxxxxx>
> >> > Acked-by: Cornelia Huck <cornelia.huck@xxxxxxxxxx>
> >> > Signed-off-by: Greg Kurz <gkurz@xxxxxxxxxxxxxxxxxx>
> >> > ---
> >> > v4: - updated subject for more clarity on what the patch does
> >> > - added James's and Connie's A-b tags
> >> > - updated documentation
> >> >
> >> > Documentation/virtual/kvm/api.txt | 7 +++----
> >> > arch/mips/kvm/mips.c | 7 ++++++-
> >> > arch/x86/kvm/x86.c | 3 +++
> >> > virt/kvm/kvm_main.c | 3 ---
> >> > 4 files changed, 12 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> > index 4d0542c5206b..486a1d783b82 100644
> >> > --- a/Documentation/virtual/kvm/api.txt
> >> > +++ b/Documentation/virtual/kvm/api.txt
> >> > @@ -199,11 +199,10 @@ Type: vm ioctl
> >> > Parameters: vcpu id (apic id on x86)
> >> > Returns: vcpu fd on success, -1 on error
> >> >
> >> > -This API adds a vcpu to a virtual machine. The vcpu id is a small integer
> >> > -in the range [0, max_vcpus).
> >> > +This API adds a vcpu to a virtual machine. The vcpu id is a positive integer.
> >>
> >> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> >> provided too high vcpu_id to an old KVM or because new KVM failed in
> >> other areas. Not a huge problem (because I expect that userspace will
> >> die on both), but a new KVM_CAP would be able to disambiguate it.
> >>
> >> Toggleable capability doesn't seem necessary and only PowerPC changes,
> >> so the capability could be arch specific ... I think that a generic one
> >> makes more sense, though.
> >>
> >
> > I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> > returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.
>
> Yes, userspace cannot disambiguate, but would have the option of not
> doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.
>

It makes sense indeed.

> > So I understand your concern but would we have a user for this ?
>
> I think so, new userspace on pre-patch KVM is the most likely one.
>
> Userspace cannot tell that KVM doesn't support the extension and
> behaving like on patched KVM would result in a failure with cryptic
> error message, because KVM only returns EINVAL.
>

This is already the case with or without the patch... which only changes
things for PowerPC userspace. And in the case of QEMU, we're already
violating the spec with the way we compute vcpu ids.

> Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
> recognizing that the user wanted too much?
>

No. The error is caught in generic code and QEMU exits for all archs.

And BTW, how would QEMU guess that vcpu id is too high ? I see at
least three paths that can return EINVAL...

> >> Userspace also doesn't know the vcpu id limit anymore, and it might
> >> care. What do you think about returning the arch-specific limit (or the
> >> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
> >>
> >
> > This is partly true: only arch agnostic code would be lost.
> >
> > Moreover this is a problem for powerpc only at the moment and userspace code
> > can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.
>
> How would that work on KVM without this patch?
>

It doesn't work for PowerPC :)

KVM_CAP_MAX_VCPUS indicates we can can start, say, 1024 vcpus and
KVM_CAP_PPC_SMT indicates the host has 8 threads per core.

KVM_CREATE_VCPU returns EINVAL when we start the 128th one because
it has vcpu_id == 128 * 8 == 1024.

Of course we can patch QEMU to restrict the maximum number of vcpus
to MAX_VCPUS / PPC_SMT for PowerPC, but it would be infortunate since
KVM for PowerPC is sized to run MAX_VCPUS... :\

> > For other architectures, it is simply KVM_MAX_VCPUS.
>
> (Other architectures would not implement the capability.)
>

So this would be KVM_CAP_PPC_MAX_VCPU_ID ?

> >> I think this would also clarify the connection between VCPU limit and
> >> VCPU_ID limit. Or is a boolean cap better?
> >>
> >
> > Well, I'm not fan of adding a generic API to handle a corner case...
>
> I don't like it either, but I think that introducing the capability is
> worth avoided problems.
>

I admit that having separate capabilities for the number of vcpus and the
maximum vcpu id fixes the confusion once and for all.

> > maybe later
> > if we have other scenarios where vcpu ids need to cross the limit ?
>
> x86 is going to have that soon too -- vcpu_id will be able to range from
> 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
> won't be improved to actually scale, so MAX_CPUS will remain lower.
>

Do you have some pointers to share so that we can see the broader picture ?

Thanks !

--
Greg