Re: [REPOST PATCH] arm/arm64: KVM: Add PSCI version selection API

From: Christoffer Dall
Date: Mon Apr 09 2018 - 08:30:51 EST


On Thu, Mar 15, 2018 at 07:26:48PM +0000, Marc Zyngier wrote:
> On 15/03/18 19:13, Peter Maydell wrote:
> > On 15 March 2018 at 19:00, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
> >> On 06/03/18 09:21, Andrew Jones wrote:
> >>> On Mon, Mar 05, 2018 at 04:47:55PM +0000, Peter Maydell wrote:
> >>>> On 2 March 2018 at 11:11, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
> >>>>> On Fri, 02 Mar 2018 10:44:48 +0000,
> >>>>> Auger Eric wrote:
> >>>>>> I understand the get/set is called as part of the migration process.
> >>>>>> So my understanding is the benefit of this series is migration fails in
> >>>>>> those cases:
> >>>>>>
> >>>>>>> =0.2 source -> 0.1 destination
> >>>>>> 0.1 source -> >=0.2 destination
> >>>>>
> >>>>> It also fails in the case where you migrate a 1.0 guest to something
> >>>>> that cannot support it.
> >>>>
> >>>> I think it would be useful if we could write out the various
> >>>> combinations of source, destination and what we expect/want to
> >>>> have happen. My gut feeling here is that we're sacrificing
> >>>> exact migration compatibility in favour of having the guest
> >>>> automatically get the variant-2 mitigations, but it's not clear
> >>>> to me exactly which migration combinations that's intended to
> >>>> happen for. Marc?
> >>>>
> >>>> If this wasn't a mitigation issue the desired behaviour would be
> >>>> straightforward:
> >>>> * kernel should default to 0.2 on the basis that
> >>>> that's what it did before
> >>>> * new QEMU version should enable 1.0 by default for virt-2.12
> >>>> and 0.2 for virt-2.11 and earlier
> >>>> * PSCI version info shouldn't appear in migration stream unless
> >>>> it's something other than 0.2
> >>>> But that would leave some setups (which?) unnecessarily without the
> >>>> mitigation, so we're not doing that. The question is, exactly
> >>>> what *are* we aiming for?
> >>>
> >>> The reason Marc dropped this patch from the series it was first introduced
> >>> in was because we didn't have the aim 100% understood. We want the
> >>> mitigation by default, but also to have the least chance of migration
> >>> failure, and when we must fail (because we're not doing the
> >>> straightforward approach listed above, which would prevent failures), then
> >>> we want to fail with the least amount of damage to the user.
> >>>
> >>> I experimented with a couple different approaches and provided tables[1]
> >>> with my results. I even recommended an approach, but I may have changed
> >>> my mind after reading Marc's follow-up[2]. The thread continues from
> >>> there as well with follow-ups from Christoffer, Marc, and myself. Anyway,
> >>> Marc did this repost for us to debate it and work out the best approach
> >>> here.
> >> It doesn't look like we've made much progress on this, which makes me
> >> think that we probably don't need anything of the like.
> >
> > I was waiting for a better explanation from you of what we're trying to
> > achieve. If you want to take the "do nothing" approach then a list
> > also of what migrations succeed/fail/break in that case would also
> > be useful.
> >
> > (I am somewhat lazily trying to avoid having to spend time reverse
> > engineering the "what are we trying to do and what effects are
> > we accepting" parts from the patch and the code that's already gone
> > into the kernel.)
>
> OK, let me (re)state the problem:
>
> For a guest that requests PSCI 0.2 (i.e. all guests from the past 4 or 5
> years), we now silently upgrade the PSCI version to 1.0 allowing the new
> SMCCC to be discovered, and the ARCH_WORKAROUND_1 service to be called.
>
> Things get funny, specially with migration (and the way QEMU works).
>
> If we "do nothing":
>
> (1) A guest migrating from an "old" host to a "new" host will silently
> see its PSCI version upgraded. Not a big deal in my opinion, as 1.0 is a
> strict superset of 0.2 (apart from the version number...).
>
> (2) A guest migrating from a "new" host to an "old" host will silently
> loose its Spectre v2 mitigation. That's quite a big deal.
>
> (3, not related to migration) A guest having a hardcoded knowledge of
> PSCI 0.2 will se that we've changed something, and may decide to catch
> fire. Oh well.
>
> If we take this patch:
>
> (1) still exists

No problem, IMHO.

>
> (2) will now fail to migrate. I see this as a feature.

Yes, I agree. This is actually the most important reason for doing
anything beyond what's already merged.

>
> (3) can be worked around by setting the "PSCI version pseudo register"
> to 0.2.

Nice to have, but we're probably not expecting this to be of major
concern. I initially thought it was a nice debugging feature as well,
but that may be a ridiculous point.

>
> These are the main things I can think of at the moment.

So I think we we should merge this patch.

If userspace then wants to support "migrate from explicitly set v0.2 new
kernel to old kernel", then it must add specific support to filter out
the register from the register list; not that I think anyone will need
that or bother to implement it.

In other words, I think you should merge this:

Reviewed-by: Christoffer Dall <cdall@xxxxxxxxxx>