Re: [PATCH v2 0/6] LoongArch: KVM: Set max VM supported FPU type with FPU exception

From: Huacai Chen

Date: Tue Jun 02 2026 - 00:38:41 EST


On Tue, Jun 2, 2026 at 9:23 AM Bibo Mao <maobibo@xxxxxxxxxxx> wrote:
>
>
>
> On 2026/6/1 下午9:52, Huacai Chen wrote:
> > On Thu, Apr 9, 2026 at 8:14 PM Bibo Mao <maobibo@xxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 2026/4/8 下午3:47, Huacai Chen wrote:
> >>> On Wed, Apr 8, 2026 at 9:04 AM Bibo Mao <maobibo@xxxxxxxxxxx> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2026/4/7 下午8:41, Huacai Chen wrote:
> >>>>> On Mon, Mar 30, 2026 at 6:00 PM Huacai Chen <chenhuacai@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> Hi, Bibo,
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Mar 30, 2026 at 11:58 AM Bibo Mao <maobibo@xxxxxxxxxxx> wrote:
> >>>>>>>
> >>>>>>> With FPU save and restore flow, the cost is the same with different
> >>>>>>> FPU width 8/16/32 bytes, whatever from CPU cycle and cache line impaction.
> >>>>>>>
> >>>>>>> Here is to enable FPU with max VM supported type, for example if
> >>>>>>> VM supports LASX instrction, enable FPU with LASX type even with FPU
> >>>>>>> exeception. So it can avoid possible LSX/LASX exception in future.
> >>>>>>>
> >>>>>>> With context switch microbench which may touch FPU and LASX, there is 9%
> >>>>>>> improvement when halt_poll_ns is disabled. The command is
> >>>>>>> "./context --test=pipe" and source code located at:
> >>>>>>> https://github.com/bibo-mao/context_switch/blob/main/context.c
> >>>>>>>
> >>>>>>> Original Wih patch improvement
> >>>>>>> 75232 82440 9%
> >>>>>>>
> >>>>>>> Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
> >>>>>>> ---
> >>>>>>> v1 ... v2:
> >>>>>>> 1. Enable FPU with max VM supported FPU type, rather than max used type.
> >>>>>>> 2. Add new request bit KVM_REQ_LBT_LOAD for LBT restore
> >>>>>>> 3. Rename KVM_REQ_AUX_LOAD with KVM_REQ_FPU_LOAD
> >>>>>>> 3. Remove aux_ldtype and KVM_LARCH_LSX/KVM_LARCH_LSX
> >>>>>>> 4. Remove middle FPU state handling in kvm_own_lsx() and kvm_own_lasx(),
> >>>>>>> directly enable LSX or LASX from FPU none state.
> >>>>>>> ---
> >>>>>>> Bibo Mao (6):
> >>>>>>> LoongArch: KVM: Add separate KVM_REQ_LBT_LOAD request bit
> >>>>>>> LoongArch: KVM: Enable FPU with max VM supported FPU type
> >>>>>>> LoongArch: KVM: Rename KVM_REQ_AUX_LOAD with KVM_REQ_FPU_LOAD
> >>>>>> Patch-1 add KVM_REQ_LBT_LOAD, then KVM_REQ_AUX_LOAD is only for FPU,
> >>>>>> so I think Patch-3 should be squashed into Patch-1.
> >>>>>>
> >>>>>>> LoongArch: KVM: Remove some middle FPU states
> >>>>>>> LoongArch: KVM: Use vm_guest_has_fpu API in kvm_lose_fpu()
> >>>>>>> LoongArch: KVM: Remove KVM_LARCH_LASX and KVM_LARCH_LSX
> >>>>>> Patch-5 remove the consumer side of KVM_LARCH_LASX / KVM_LARCH_LSX and
> >>>>>> Patch-6 remove the provider side of KVM_LARCH_LASX / KVM_LARCH_LSX, so
> >>>>>> I think Patch-6 should be squashed into Patch-5, too.
> >>>>> And could you please test the power consumption where there are many
> >>>>> VMs that only use FPU rather than LSX/LASX? As far as I know, the
> >>>> Hi Huacai,
> >>>>
> >>>> Thanks for reviewing this patch
> >>>> Could you help me to test the power consumption with this patch?
> >>>>> power consumption of LASX is significantly more than FPU, which is a
> >>>> Could the describe the detail scenery wheree LASX instruction is
> >>>> frequently used by application or LASX EUEN enabled?
> >>> The purpose is to evaluate power consumption before and after this series.
> >>>
> >>> So we can run 10 VMs (suppose there are 4 cores), and there is a
> >>> workload run in every VM, the workload only uses FPU, no LSX and LASX.
> >>> Before this series, KVM only save/restore FPU context; after this
> >>> series, KVM will save/restore LASX context.
> >> In theory there should be such test, only that it requires special Power
> >> Management team and testbed. Even if there is such team, maybe they care
> >> power consumption about DDR and screen, do not want do such experiments
> >> for this patch.
> >>
> >> If someone is volunteer to do this, I do not object, also I want to get
> >> the result. Just from my guess, there should be no much difference since
> >> FPU context switch is much smaller than the whole workload, else the
> >> workload is just FPU context switch.
> > Any updates?
> It deeps on Power Management exports.
>
> By cyclictest/fpubench test on 3C6000, the context switch cost is 3us,
> the fpu save cost is 7.6ns and 7.8ns with fpu restore. FPU switch cost
> is about 0.25% of the whole context switch. I do not why power
> consumption is important factor here.
I'm talking about this:
https://lore.kernel.org/loongarch/b1b07f90-3db3-c3ff-2297-f8eaaa3f6faa@xxxxxxxxxxx/T/#m6da21ae2f4abfba362a2cd189cb4d8d09e33a6d7

Huacai

>
> Regards
> Bibo Mao
> >
> >
> > Huacai
> >>
> >> Regards
> >> Bibo Mao
> >>
> >>>
> >>> If the power consumptions are similar, then this series is perfect.
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> Regards
> >>>> Bibo Mao
> >>>>> little similar to AVX512 (Linus said it is a power virus).
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> Huacai
> >>>>>>
> >>>>>>>
> >>>>>>> arch/loongarch/include/asm/kvm_host.h | 6 +--
> >>>>>>> arch/loongarch/kvm/exit.c | 21 +++-----
> >>>>>>> arch/loongarch/kvm/vcpu.c | 78 ++++++++-------------------
> >>>>>>> 3 files changed, 30 insertions(+), 75 deletions(-)
> >>>>>>>
> >>>>>>>
> >>>>>>> base-commit: 7aaa8047eafd0bd628065b15757d9b48c5f9c07d
> >>>>>>> --
> >>>>>>> 2.39.3
> >>>>>>>
> >>>>>>>
> >>>>
> >>>>
> >>
> >>
>