Re: [PATCH V3 2/2] LoongArch: KVM: fix "unreliable stack" issue

From: lixianglai
Date: Sun Dec 28 2025 - 22:57:26 EST


Hi Jinyang:
On 2025-12-27 09:27, Xianglai Li wrote:

Insert the appropriate UNWIND macro definition into the kvm_exc_entry in
the assembly function to guide the generation of correct ORC table entries,
thereby solving the timeout problem of loading the livepatch-sample module
on a physical machine running multiple vcpus virtual machines.

While solving the above problems, we have gained an additional benefit,
that is, we can obtain more call stack information

Stack information that can be obtained before the problem is fixed:
[<0>] kvm_vcpu_block+0x88/0x120 [kvm]
[<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
[<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
[<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
[<0>] kvm_handle_exit+0x160/0x270 [kvm]
[<0>] kvm_exc_entry+0x100/0x1e0

Stack information that can be obtained after the problem is fixed:
[<0>] kvm_vcpu_block+0x88/0x120 [kvm]
[<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
[<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
[<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
[<0>] kvm_handle_exit+0x160/0x270 [kvm]
[<0>] kvm_exc_entry+0x104/0x1e4
[<0>] kvm_enter_guest+0x38/0x11c
[<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
[<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
[<0>] sys_ioctl+0x498/0xf00
[<0>] do_syscall+0x98/0x1d0
[<0>] handle_syscall+0xb8/0x158

Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Xianglai Li <lixianglai@xxxxxxxxxxx>
---
Cc: Huacai Chen <chenhuacai@xxxxxxxxxx>
Cc: WANG Xuerui <kernel@xxxxxxxxxx>
Cc: Tianrui Zhao <zhaotianrui@xxxxxxxxxxx>
Cc: Bibo Mao <maobibo@xxxxxxxxxxx>
Cc: Charlie Jenkins <charlie@xxxxxxxxxxxx>
Cc: Xianglai Li <lixianglai@xxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Tiezhu Yang <yangtiezhu@xxxxxxxxxxx>

  arch/loongarch/kvm/switch.S | 28 +++++++++++++++++++---------
  1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/loongarch/kvm/switch.S b/arch/loongarch/kvm/switch.S
index 93845ce53651..a3ea9567dbe5 100644
--- a/arch/loongarch/kvm/switch.S
+++ b/arch/loongarch/kvm/switch.S
@@ -10,6 +10,7 @@
  #include <asm/loongarch.h>
  #include <asm/regdef.h>
  #include <asm/unwind_hints.h>
+#include <linux/kvm_types.h>
    #define HGPR_OFFSET(x)        (PT_R0 + 8*x)
  #define GGPR_OFFSET(x)        (KVM_ARCH_GGPR + 8*x)
@@ -110,9 +111,9 @@
       * need to copy world switch code to DMW area.
       */
      .text
+    .p2align PAGE_SHIFT
      .cfi_sections    .debug_frame
  SYM_CODE_START(kvm_exc_entry)
-    .p2align PAGE_SHIFT
      UNWIND_HINT_UNDEFINED
      csrwr    a2,   KVM_TEMP_KS
      csrrd    a2,   KVM_VCPU_KS
@@ -170,6 +171,7 @@ SYM_CODE_START(kvm_exc_entry)
      /* restore per cpu register */
      ld.d    u0, a2, KVM_ARCH_HPERCPU
      addi.d    sp, sp, -PT_SIZE
+    UNWIND_HINT_REGS
        /* Prepare handle exception */
      or    a0, s0, zero
@@ -200,7 +202,7 @@ ret_to_host:
      jr      ra
    SYM_CODE_END(kvm_exc_entry)
-EXPORT_SYMBOL(kvm_exc_entry)
+EXPORT_SYMBOL_FOR_KVM(kvm_exc_entry)
    /*
   * int kvm_enter_guest(struct kvm_run *run, struct kvm_vcpu *vcpu)
@@ -215,6 +217,14 @@ SYM_FUNC_START(kvm_enter_guest)
      /* Save host GPRs */
      kvm_save_host_gpr a2
  +    /*
+     * The csr_era member variable of the pt_regs structure is required
+     * for unwinding orc to perform stack traceback, so we need to put
+     * pc into csr_era member variable here.
+     */
+    pcaddi    t0, 0
+    st.d    t0, a2, PT_ERA
Hi, Xianglai,

It should use `SYM_CODE_START` to mark the `kvm_enter_guest` rather than
`SYM_FUNC_START`, since the `SYM_FUNC_START` is used to mark "C-likely"
asm functionw.

Ok, I will use SYM_CODE_START to mark kvm_enter_guest in the next version.

I guess the kvm_enter_guest is something like exception
handler becuase the last instruction is "ertn". So usually it should
mark UNWIND_HINT_REGS where can find last frame info by "$sp".
However, all info is store to "$a2", this mark should be
  `UNWIND_HINT sp_reg=ORC_REG_A2(???) type=UNWIND_HINT_TYPE_REGS`.
I don't konw why save this function internal PC here by `pcaddi t0, 0`,
and I think it is no meaning(, for exception handler, they save last PC
by read CSR.ERA). The `kvm_enter_guest` saves registers by
"$a2"("$sp" - PT_REGS) beyond stack ("$sp"), it is dangerous if IE
is enable. So I wonder if there is really a stacktrace through this function?

The stack backtracking issue in switch.S is rather complex because it involves the switching between cpu root-mode and guest-mode:
Real stack backtracking should be divided into two parts:
part 1:
    [<0>] kvm_enter_guest+0x38/0x11c
    [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
    [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
    [<0>] sys_ioctl+0x498/0xf00
    [<0>] do_syscall+0x98/0x1d0
    [<0>] handle_syscall+0xb8/0x158

part 2:
    [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
    [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
    [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
    [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
    [<0>] kvm_handle_exit+0x160/0x270 [kvm]
    [<0>] kvm_exc_entry+0x104/0x1e4


In "part 1", after executing kvm_enter_guest, the cpu switches from root-mode to guest-mode.
In this case, stack backtracking is indeed very rare.

In "part 2", the cpu switches from the guest-mode to the root-mode,
and most of the stack backtracking occurs during this phase.

To obtain the longest call chain, we save pc in kvm_enter_guest to pt_regs.csr_era,
and after restoring the sp of the root-mode cpu in kvm_exc_entry,
The ORC entry was re-established using "UNWIND_HINT_REGS",
 and then we obtained the following stack backtrace as we wanted:

    [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
    [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
    [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
    [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
    [<0>] kvm_handle_exit+0x160/0x270 [kvm]
    [<0>] kvm_exc_entry+0x104/0x1e4
    [<0>] kvm_enter_guest+0x38/0x11c
    [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
    [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
    [<0>] sys_ioctl+0x498/0xf00
    [<0>] do_syscall+0x98/0x1d0
    [<0>] handle_syscall+0xb8/0x158

Doing so is equivalent to ignoring the details of the cpu root-mode and guest-mode switching.
About what you said in the IE enable phase is dangerous,
interrupts are always off during the cpu root-mode and guest-mode switching in kvm_enter_guest and kvm_exc_entry.

Thanks!
Xianglai.

Jinyang


+
      addi.d    a2, a1, KVM_VCPU_ARCH
      st.d    sp, a2, KVM_ARCH_HSP
      st.d    tp, a2, KVM_ARCH_HTP
@@ -225,7 +235,7 @@ SYM_FUNC_START(kvm_enter_guest)
      csrwr    a1, KVM_VCPU_KS
      kvm_switch_to_guest
  SYM_FUNC_END(kvm_enter_guest)
-EXPORT_SYMBOL(kvm_enter_guest)
+EXPORT_SYMBOL_FOR_KVM(kvm_enter_guest)
    SYM_FUNC_START(kvm_save_fpu)
      fpu_save_csr    a0 t1
@@ -233,7 +243,7 @@ SYM_FUNC_START(kvm_save_fpu)
      fpu_save_cc    a0 t1 t2
      jr              ra
  SYM_FUNC_END(kvm_save_fpu)
-EXPORT_SYMBOL(kvm_save_fpu)
+EXPORT_SYMBOL_FOR_KVM(kvm_save_fpu)
    SYM_FUNC_START(kvm_restore_fpu)
      fpu_restore_double a0 t1
@@ -241,7 +251,7 @@ SYM_FUNC_START(kvm_restore_fpu)
      fpu_restore_cc       a0 t1 t2
      jr                 ra
  SYM_FUNC_END(kvm_restore_fpu)
-EXPORT_SYMBOL(kvm_restore_fpu)
+EXPORT_SYMBOL_FOR_KVM(kvm_restore_fpu)
    #ifdef CONFIG_CPU_HAS_LSX
  SYM_FUNC_START(kvm_save_lsx)
@@ -250,7 +260,7 @@ SYM_FUNC_START(kvm_save_lsx)
      lsx_save_data   a0 t1
      jr              ra
  SYM_FUNC_END(kvm_save_lsx)
-EXPORT_SYMBOL(kvm_save_lsx)
+EXPORT_SYMBOL_FOR_KVM(kvm_save_lsx)
    SYM_FUNC_START(kvm_restore_lsx)
      lsx_restore_data a0 t1
@@ -258,7 +268,7 @@ SYM_FUNC_START(kvm_restore_lsx)
      fpu_restore_csr  a0 t1 t2
      jr               ra
  SYM_FUNC_END(kvm_restore_lsx)
-EXPORT_SYMBOL(kvm_restore_lsx)
+EXPORT_SYMBOL_FOR_KVM(kvm_restore_lsx)
  #endif
    #ifdef CONFIG_CPU_HAS_LASX
@@ -268,7 +278,7 @@ SYM_FUNC_START(kvm_save_lasx)
      lasx_save_data  a0 t1
      jr              ra
  SYM_FUNC_END(kvm_save_lasx)
-EXPORT_SYMBOL(kvm_save_lasx)
+EXPORT_SYMBOL_FOR_KVM(kvm_save_lasx)
    SYM_FUNC_START(kvm_restore_lasx)
      lasx_restore_data a0 t1
@@ -276,7 +286,7 @@ SYM_FUNC_START(kvm_restore_lasx)
      fpu_restore_csr   a0 t1 t2
      jr                ra
  SYM_FUNC_END(kvm_restore_lasx)
-EXPORT_SYMBOL(kvm_restore_lasx)
+EXPORT_SYMBOL_FOR_KVM(kvm_restore_lasx)
  #endif
    #ifdef CONFIG_CPU_HAS_LBT