Re: [PATCH 1/2] x86,kvm: move qemu/guest FPU switching out to vcpu_run

From: Rik van Riel
Date: Wed Nov 15 2017 - 09:41:02 EST


On Wed, 2017-11-15 at 12:33 +0800, Wanpeng Li wrote:
> 2017-11-15 11:03 GMT+08:00 Rik van Riel <riel@xxxxxxxxxx>:
> > On Wed, 2017-11-15 at 08:47 +0800, Wanpeng Li wrote:
> > > 2017-11-15 5:54 GMT+08:00ÂÂ<riel@xxxxxxxxxx>:
> > > > From: Rik van Riel <riel@xxxxxxxxxx>
> > > >
> > > > Currently, every time a VCPU is scheduled out, the host kernel
> > > > will
> > > > first save the guest FPU/xstate context, then load the qemu
> > > > userspace
> > > > FPU context, only to then immediately save the qemu userspace
> > > > FPU
> > > > context back to memory. When scheduling in a VCPU, the same
> > > > extraneous
> > > > FPU loads and saves are done.
> > > >
> > > > This could be avoided by moving from a model where the guest
> > > > FPU is
> > > > loaded and stored with preemption disabled, to a model where
> > > > the
> > > > qemu userspace FPU is swapped out for the guest FPU context for
> > > > the duration of the KVM_RUN ioctl.
> > >
> > > What will happen if CONFIG_PREEMPT is enabled?
> >
> > The scheduler will save the guest FPU context when a
> > VCPU thread is preempted, and restore it when it is
> > scheduled back in.
>
> I mean all the involved processes will use fpu. Before patch if
> kernel
> preempt occur:
>
> context_switch
> Â -> prepare_task_switch
> ÂÂÂÂÂÂÂÂ-> fire_sched_out_preempt_notifiers
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> kvm_sched_out
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> kvm_arch_vcpu_put
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> kvm_put_guest_fpu
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> copy_fpregs_to_fpstate(&vcpu-
> >arch.guest_fpu)
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂsave xsave area to guest fpu
> buffer
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> __kernel_fpu_end
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ->
> copy_kernel_to_fpregs(&current->thread.fpu.state)
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂrestore prev vCPU qemu
> userspace FPU to the xsave area
> Â -> switch_to
> ÂÂÂÂÂÂÂÂ-> __switch_to
> ÂÂÂÂÂÂÂÂÂÂÂÂ-> switch_fpu_prepare
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> copy_fpregs_to_fpstate => save xsave area to
> prev
> vCPU qemu userspace FPU
> ÂÂÂÂÂÂÂÂÂÂÂÂ-> switch_fpu_finish
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> copy_kernel_to_fpgregs => restore next task FPU
> to xsave area
>
>
> After the patch:
>
> context_switch
> Â -> prepare_task_switch
> ÂÂÂÂÂÂÂÂ-> fire_sched_out_preempt_notifiers
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> kvm_sched_out
>
> Â-> switch_to
> ÂÂÂÂÂÂÂÂ-> __switch_to
> ÂÂÂÂÂÂÂÂÂÂÂÂ-> switch_fpu_prepare
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ-> copy_fpregs_to_fpstateÂÂÂÂÂÂÂÂÂ=> Oops
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂsave xsave area to prev vCPU qemu userspace FPU,
> actually the guest FPU buffer is loaded in xsave area, you transmit
> guest FPU in xsave area into the prev vCPU qemu userspace FPU

When entering kvm_arch_vcpu_ioctl_run we save the qemu userspace
FPU context in &vcpu->arch.user_fpu, and we restore that before
leaving kvm_arch_vcpu_ioctl_run.

Userspace should always see the userspace FPU context, no?

Am I overlooking anything?