Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls

From: Andrew Murray
Date: Thu Jan 09 2020 - 06:25:10 EST


On Thu, Jan 09, 2020 at 11:23:37AM +0000, Andrew Murray wrote:
> On Wed, Jan 08, 2020 at 01:10:21PM +0000, Will Deacon wrote:
> > On Wed, Jan 08, 2020 at 12:36:11PM +0000, Marc Zyngier wrote:
> > > On 2020-01-08 11:58, Will Deacon wrote:
> > > > On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote:
> > > > > On 2020-01-07 15:13, Andrew Murray wrote:
> > > > > > Looking at the vcpu_load and related code, I don't see a way of saying
> > > > > > 'don't schedule this VCPU on this CPU' or bailing in any way.
> > > > >
> > > > > That would actually be pretty easy to implement. In vcpu_load(), check
> > > > > that that the CPU physical has SPE. If not, raise a request for that
> > > > > vcpu.
> > > > > In the run loop, check for that request and abort if raised, returning
> > > > > to userspace.
>
> I hadn't really noticed the kvm_make_request mechanism - however it's now
> clear how this could be implemented.
>
> This approach gives responsibility for which CPUs should be used to userspace
> and if userspace gets it wrong then the KVM_RUN ioctl won't do very much.
>
>
> > > > >
> > > > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> > > > > where to run that particular vcpu.
> > > >
> > > > It's also worth considering systems where there are multiple
> > > > implementations
> > > > of SPE in play. Assuming we don't want to expose this to a guest, then
> > > > the
> > > > right interface here is probably for userspace to pick one SPE
> > > > implementation and expose that to the guest.
>
> If I understand correctly then this implies the following:
>
> - If the host userspace indicates it wants support for SPE in the guest (via
> KVM_SET_DEVICE_ATTR at start of day) - then we should check in vcpu_load that
> the minimum version of SPE is present on the current CPU. 'minimum' because
> we don't know why userspace has selected the given cpumask.
>
> - Userspace can get it wrong, i.e. it can create a CPU mask with CPUs that
> have SPE with differing versions. If it does, and all CPUs have some form of
> SPE then errors may occur in the guest. Perhaps this is OK and userspace
> shouldn't get it wrong?

Actually this could be guarded against by emulating the ID_AA64DFR0_EL1 such to
cap the version to the minimum SPE version - if absolutely required.

Thanks,

Andrew Murray

>
>
> > > > That fits with your idea
> > > > above,
> > > > where you basically get an immediate exit if we try to schedule a vCPU
> > > > onto
> > > > a CPU that isn't part of the SPE mask.
> > >
> > > Then it means that the VM should be configured with a mask indicating
> > > which CPUs it is intended to run on, and setting such a mask is mandatory
> > > for SPE.
> >
> > Yeah, and this could probably all be wrapped up by userspace so you just
> > pass the SPE PMU name or something and it grabs the corresponding cpumask
> > for you.
> >
> > > > > > One solution could be to allow scheduling onto non-SPE VCPUs but wrap
> > > > > > the
> > > > > > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
> > > > > > reads the non-sanitised feature register. Therefore we don't go bang,
> > > > > > but
> > > > > > we also increase the size of any black-holes in SPE capturing. Though
> > > > > > this
> > > > > > feels like something that will cause grief down the line.
> > > > > >
> > > > > > Is there something else that can be done?
> > > > >
> > > > > How does userspace deal with this? When SPE is only available on
> > > > > half of
> > > > > the CPUs, how does perf work in these conditions?
> > > >
> > > > Not sure about userspace, but the kernel driver works by instantiating
> > > > an
> > > > SPE PMU instance only for the CPUs that have it and then that instance
> > > > profiles for only those CPUs. You also need to do something similar if
> > > > you had two CPU types with SPE, since the SPE configuration is likely to
> > > > be
> > > > different between them.
> > >
> > > So that's closer to what Andrew was suggesting above (running a guest on a
> > > non-SPE CPU creates a profiling black hole). Except that we can't really
> > > run a SPE-enabled guest on a non-SPE CPU, as the SPE sysregs will UNDEF
> > > at EL1.
> >
> > Right. I wouldn't suggest the "black hole" approach for VMs, but it works
> > for userspace so that's why the driver does it that way.
> >
> > > Conclusion: we need a mix of a cpumask to indicate which CPUs we want to
> > > run on (generic, not-SPE related),
>
> If I understand correctly this mask isn't exposed to KVM (in the kernel) and
> KVM (in the kernel) is unware of how the CPUs that have KVM_RUN called are
> selected.
>
> Thus this implies the cpumask is a feature of KVM tool or QEMU that would
> need to be added there. (E.g. kvm_cmd_run_work would set some affinity when
> creating pthreads - based on a CPU mask triggered by setting the --spe flag)?
>
> Thanks,
>
> Andrew Murray
>
> > and a check for SPE-capable CPUs.
> > > If any of these condition is not satisfied, the vcpu exits for userspace
> > > to sort out the affinity.
> > >
> > > I hate heterogeneous systems.
> >
> > They hate you too ;)
> >
> > Will
> _______________________________________________
> kvmarm mailing list
> kvmarm@xxxxxxxxxxxxxxxxxxxxx
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm