Re: [PATCH V3 2/2] LoongArch: KVM: fix "unreliable stack" issue
From: Huacai Chen
Date: Sat Apr 11 2026 - 10:30:01 EST
On Wed, Apr 8, 2026 at 9:26 AM Bibo Mao <maobibo@xxxxxxxxxxx> wrote:
>
>
>
> On 2025/12/30 下午1:53, Jinyang He wrote:
> > On 2025-12-30 12:03, Bibo Mao wrote:
> >
> >>
> >>
> >> On 2025/12/30 上午11:36, Jinyang He wrote:
> >>> On 2025-12-30 10:24, Bibo Mao wrote:
> >>>
> >>>>
> >>>>
> >>>> On 2025/12/29 下午6:41, Jinyang He wrote:
> >>>>> On 2025-12-29 18:11, lixianglai wrote:
> >>>>>
> >>>>>> Hi Jinyang:
> >>>>>>>
> >>>>>>> On 2025-12-29 11:53, lixianglai wrote:
> >>>>>>>> Hi Jinyang:
> >>>>>>>>> On 2025-12-27 09:27, Xianglai Li wrote:
> >>>>>>>>>
> >>>>>>>>>> Insert the appropriate UNWIND macro definition into the
> >>>>>>>>>> kvm_exc_entry in
> >>>>>>>>>> the assembly function to guide the generation of correct ORC
> >>>>>>>>>> table entries,
> >>>>>>>>>> thereby solving the timeout problem of loading the
> >>>>>>>>>> livepatch-sample module
> >>>>>>>>>> on a physical machine running multiple vcpus virtual machines.
> >>>>>>>>>>
> >>>>>>>>>> While solving the above problems, we have gained an additional
> >>>>>>>>>> benefit,
> >>>>>>>>>> that is, we can obtain more call stack information
> >>>>>>>>>>
> >>>>>>>>>> Stack information that can be obtained before the problem is
> >>>>>>>>>> fixed:
> >>>>>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>>>>>>> [<0>] kvm_exc_entry+0x100/0x1e0
> >>>>>>>>>>
> >>>>>>>>>> Stack information that can be obtained after the problem is
> >>>>>>>>>> fixed:
> >>>>>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>>>>>>> [<0>] kvm_exc_entry+0x104/0x1e4
> >>>>>>>>>> [<0>] kvm_enter_guest+0x38/0x11c
> >>>>>>>>>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
> >>>>>>>>>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
> >>>>>>>>>> [<0>] sys_ioctl+0x498/0xf00
> >>>>>>>>>> [<0>] do_syscall+0x98/0x1d0
> >>>>>>>>>> [<0>] handle_syscall+0xb8/0x158
> >>>>>>>>>>
> >>>>>>>>>> Cc: stable@xxxxxxxxxxxxxxx
> >>>>>>>>>> Signed-off-by: Xianglai Li <lixianglai@xxxxxxxxxxx>
> >>>>>>>>>> ---
> >>>>>>>>>> Cc: Huacai Chen <chenhuacai@xxxxxxxxxx>
> >>>>>>>>>> Cc: WANG Xuerui <kernel@xxxxxxxxxx>
> >>>>>>>>>> Cc: Tianrui Zhao <zhaotianrui@xxxxxxxxxxx>
> >>>>>>>>>> Cc: Bibo Mao <maobibo@xxxxxxxxxxx>
> >>>>>>>>>> Cc: Charlie Jenkins <charlie@xxxxxxxxxxxx>
> >>>>>>>>>> Cc: Xianglai Li <lixianglai@xxxxxxxxxxx>
> >>>>>>>>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >>>>>>>>>> Cc: Tiezhu Yang <yangtiezhu@xxxxxxxxxxx>
> >>>>>>>>>>
> >>>>>>>>>> arch/loongarch/kvm/switch.S | 28 +++++++++++++++++++---------
> >>>>>>>>>> 1 file changed, 19 insertions(+), 9 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/loongarch/kvm/switch.S
> >>>>>>>>>> b/arch/loongarch/kvm/switch.S
> >>>>>>>>>> index 93845ce53651..a3ea9567dbe5 100644
> >>>>>>>>>> --- a/arch/loongarch/kvm/switch.S
> >>>>>>>>>> +++ b/arch/loongarch/kvm/switch.S
> >>>>>>>>>> @@ -10,6 +10,7 @@
> >>>>>>>>>> #include <asm/loongarch.h>
> >>>>>>>>>> #include <asm/regdef.h>
> >>>>>>>>>> #include <asm/unwind_hints.h>
> >>>>>>>>>> +#include <linux/kvm_types.h>
> >>>>>>>>>> #define HGPR_OFFSET(x) (PT_R0 + 8*x)
> >>>>>>>>>> #define GGPR_OFFSET(x) (KVM_ARCH_GGPR + 8*x)
> >>>>>>>>>> @@ -110,9 +111,9 @@
> >>>>>>>>>> * need to copy world switch code to DMW area.
> >>>>>>>>>> */
> >>>>>>>>>> .text
> >>>>>>>>>> + .p2align PAGE_SHIFT
> >>>>>>>>>> .cfi_sections .debug_frame
> >>>>>>>>>> SYM_CODE_START(kvm_exc_entry)
> >>>>>>>>>> - .p2align PAGE_SHIFT
> >>>>>>>>>> UNWIND_HINT_UNDEFINED
> >>>>>>>>>> csrwr a2, KVM_TEMP_KS
> >>>>>>>>>> csrrd a2, KVM_VCPU_KS
> >>>>>>>>>> @@ -170,6 +171,7 @@ SYM_CODE_START(kvm_exc_entry)
> >>>>>>>>>> /* restore per cpu register */
> >>>>>>>>>> ld.d u0, a2, KVM_ARCH_HPERCPU
> >>>>>>>>>> addi.d sp, sp, -PT_SIZE
> >>>>>>>>>> + UNWIND_HINT_REGS
> >>>>>>>>>> /* Prepare handle exception */
> >>>>>>>>>> or a0, s0, zero
> >>>>>>>>>> @@ -200,7 +202,7 @@ ret_to_host:
> >>>>>>>>>> jr ra
> >>>>>>>>>> SYM_CODE_END(kvm_exc_entry)
> >>>>>>>>>> -EXPORT_SYMBOL(kvm_exc_entry)
> >>>>>>>>>> +EXPORT_SYMBOL_FOR_KVM(kvm_exc_entry)
> >>>>>>>>>> /*
> >>>>>>>>>> * int kvm_enter_guest(struct kvm_run *run, struct kvm_vcpu
> >>>>>>>>>> *vcpu)
> >>>>>>>>>> @@ -215,6 +217,14 @@ SYM_FUNC_START(kvm_enter_guest)
> >>>>>>>>>> /* Save host GPRs */
> >>>>>>>>>> kvm_save_host_gpr a2
> >>>>>>>>>> + /*
> >>>>>>>>>> + * The csr_era member variable of the pt_regs structure
> >>>>>>>>>> is required
> >>>>>>>>>> + * for unwinding orc to perform stack traceback, so we
> >>>>>>>>>> need to put
> >>>>>>>>>> + * pc into csr_era member variable here.
> >>>>>>>>>> + */
> >>>>>>>>>> + pcaddi t0, 0
> >>>>>>>>>> + st.d t0, a2, PT_ERA
> >>>>>>>>> Hi, Xianglai,
> >>>>>>>>>
> >>>>>>>>> It should use `SYM_CODE_START` to mark the `kvm_enter_guest`
> >>>>>>>>> rather than
> >>>>>>>>> `SYM_FUNC_START`, since the `SYM_FUNC_START` is used to mark
> >>>>>>>>> "C-likely"
> >>>>>>>>> asm functionw.
> >>>>>>>>
> >>>>>>>> Ok, I will use SYM_CODE_START to mark kvm_enter_guest in the
> >>>>>>>> next version.
> >>>>>>>>
> >>>>>>>>> I guess the kvm_enter_guest is something like exception
> >>>>>>>>> handler becuase the last instruction is "ertn". So usually it
> >>>>>>>>> should
> >>>>>>>>> mark UNWIND_HINT_REGS where can find last frame info by "$sp".
> >>>>>>>>> However, all info is store to "$a2", this mark should be
> >>>>>>>>> `UNWIND_HINT sp_reg=ORC_REG_A2(???) type=UNWIND_HINT_TYPE_REGS`.
> >>>>>>>>> I don't konw why save this function internal PC here by `pcaddi
> >>>>>>>>> t0, 0`,
> >>>>>>>>> and I think it is no meaning(, for exception handler, they save
> >>>>>>>>> last PC
> >>>>>>>>> by read CSR.ERA). The `kvm_enter_guest` saves registers by
> >>>>>>>>> "$a2"("$sp" - PT_REGS) beyond stack ("$sp"), it is dangerous if IE
> >>>>>>>>> is enable. So I wonder if there is really a stacktrace through
> >>>>>>>>> this function?
> >>>>>>>>>
> >>>>>>>> The stack backtracking issue in switch.S is rather complex
> >>>>>>>> because it involves the switching between cpu root-mode and
> >>>>>>>> guest-mode:
> >>>>>>>> Real stack backtracking should be divided into two parts:
> >>>>>>>> part 1:
> >>>>>>>> [<0>] kvm_enter_guest+0x38/0x11c
> >>>>>>>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
> >>>>>>>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
> >>>>>>>> [<0>] sys_ioctl+0x498/0xf00
> >>>>>>>> [<0>] do_syscall+0x98/0x1d0
> >>>>>>>> [<0>] handle_syscall+0xb8/0x158
> >>>>>>>>
> >>>>>>>> part 2:
> >>>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>>>>> [<0>] kvm_exc_entry+0x104/0x1e4
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> In "part 1", after executing kvm_enter_guest, the cpu switches
> >>>>>>>> from root-mode to guest-mode.
> >>>>>>>> In this case, stack backtracking is indeed very rare.
> >>>>>>>>
> >>>>>>>> In "part 2", the cpu switches from the guest-mode to the root-mode,
> >>>>>>>> and most of the stack backtracking occurs during this phase.
> >>>>>>>>
> >>>>>>>> To obtain the longest call chain, we save pc in kvm_enter_guest
> >>>>>>>> to pt_regs.csr_era,
> >>>>>>>> and after restoring the sp of the root-mode cpu in kvm_exc_entry,
> >>>>>>>> The ORC entry was re-established using "UNWIND_HINT_REGS",
> >>>>>>>> and then we obtained the following stack backtrace as we wanted:
> >>>>>>>>
> >>>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>>>>> [<0>] kvm_exc_entry+0x104/0x1e4
> >>>>>>> I found this might be a coincidence—correct behavior due to the
> >>>>>>> incorrect
> >>>>>>> UNWIND_HINT_REGS mark and unusual stack adjustment.
> >>>>>>>
> >>>>>>> First, the kvm_enter_guest contains only a single branch
> >>>>>>> instruction, ertn.
> >>>>>>> It hardware-jump to the CSR.ERA address directly, jump into
> >>>>>>> kvm_exc_entry.
> >>>>>>>
> >>>>>>> At this point, the stack layout looks like this:
> >>>>>>> -------------------------------
> >>>>>>> frame from call to `kvm_enter_guest`
> >>>>>>> ------------------------------- <- $sp
> >>>>>>> PT_REGS
> >>>>>>> ------------------------------- <- $a2
> >>>>>>>
> >>>>>>> Then kvm_exc_entry adjust stack without save any register (e.g.
> >>>>>>> $ra, $sp)
> >>>>>>> but still marked UNWIND_HINT_REGS.
> >>>>>>> After the adjustment:
> >>>>>>> -------------------------------
> >>>>>>> frame from call to `kvm_enter_guest`
> >>>>>>> -------------------------------
> >>>>>>> PT_REGS
> >>>>>>> ------------------------------- <- $a2, new $sp
> >>>>>>>
> >>>>>>> During unwinding, when the unwinder reaches kvm_exc_entry,
> >>>>>>> it meets the mark of PT_REGS and correctly recovers
> >>>>>>> pc = regs.csr_era, sp = regs.sp, ra = regs.ra
> >>>>>>>
> >>>>>> Yes, here unwinder does work as you say.
> >>>>>>
> >>>>>>> a) Can we avoid "ertn" rather than `jr reg (or jirl ra, reg, 0)`
> >>>>>>> instead, like call?
> >>>>>> No, we need to rely on the 'ertn instruction return PIE to CRMD IE,
> >>>>>> at the same time to ensure that its atomic,
> >>>>>> there should be no other instruction than' ertn 'more appropriate
> >>>>>> here.
> >>>>> You are right! I got it.
> >>>>>>
> >>>>>>> The kvm_exc_entry cannot back to kvm_enter_guest
> >>>>>>> if we use "ertn", so should the kvm_enter_guest appear on the
> >>>>>>> stacktrace?
> >>>>>>>
> >>>>>>
> >>>>>> It is flexible. As I mentioned above, the cpu completes the switch
> >>>>>> from host-mode to guest mode through kvm_enter_guest,
> >>>>>> and then the switch from guest mode to host-mode through
> >>>>>> kvm_exc_entry. When we ignore the details of the host-mode
> >>>>>> and guest-mode switching in the middle, we can understand that the
> >>>>>> host cpu has completed kvm_enter_guest->kvm_exc_entry.
> >>>>>> From this perspective, I think it can exist in the call stack, and
> >>>>>> at the same time, we have obtained the maximum call stack
> >>>>>> information.
> >>>>>>
> >>>>>>
> >>>>>>> b) Can we adjust $sp before entering kvm_exc_entry? Then we can mark
> >>>>>>> UNWIND_HINT_REGS at the beginning of kvm_exc_entry, which something
> >>>>>>> like ret_from_kernel_thread_asm.
> >>>>>>>
> >>>>>> The following command can be used to dump the orc entries of the
> >>>>>> kernel:
> >>>>>> ./tools/objtool/objtool --dump vmlinux
> >>>>>>
> >>>>>> You can observe that not all orc entries are generated at the
> >>>>>> beginning of the function.
> >>>>>> For example:
> >>>>>> handle_tlb_protect
> >>>>>> ftrace_stub
> >>>>>> handle_reserved
> >>>>>>
> >>>>>> So, is it unnecessary for us to modify UNWIND_HINT_REGS in order
> >>>>>> to place it at the beginning of the function.
> >>>>>>
> >>>>>> If you have a better solution, could you provide an example of the
> >>>>>> modification?
> >>>>>> I can test the feasibility of the solution.
> >>>>>>
> >>>>> The expression at the beginning of the function is incorrect
> >>>>> (feeling sorry).
> >>>>> It should be marked where have all stacktrace info.
> >>>>> Thanks for all the explaining, since I'm unfamiliar with kvm, I
> >>>>> need these to help my understanding.
> >>>>>
> >>>>> Can you try with follows, with save regs by $sp, set more precise
> >>>>> era to pt_regs, and more unwind hint.
> >>>>>
> >>>>>
> >>>>> diff --git a/arch/loongarch/kvm/switch.S b/arch/loongarch/kvm/switch.S
> >>>>> index f1768b7a6194..8ed1d7b72c54 100644
> >>>>> --- a/arch/loongarch/kvm/switch.S
> >>>>> +++ b/arch/loongarch/kvm/switch.S
> >>>>> @@ -14,13 +14,13 @@
> >>>>> #define GGPR_OFFSET(x) (KVM_ARCH_GGPR + 8*x)
> >>>>>
> >>>>> .macro kvm_save_host_gpr base
> >>>>> - .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> >>>>> + .irp n,1,2,22,23,24,25,26,27,28,29,30,31
> >>>>> st.d $r\n, \base, HGPR_OFFSET(\n)
> >>>>> .endr
> >>>>> .endm
> >>>>>
> >>>>> .macro kvm_restore_host_gpr base
> >>>>> - .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> >>>>> + .irp n,1,2,22,23,24,25,26,27,28,29,30,31
> >>>>> ld.d $r\n, \base, HGPR_OFFSET(\n)
> >>>>> .endr
> >>>>> .endm
> >>>>> @@ -88,6 +88,7 @@
> >>>>> /* Load KVM_ARCH register */
> >>>>> ld.d a2, a2, (KVM_ARCH_GGPR + 8 * REG_A2)
> >>>>>
> >>>>> +111:
> >>>>> ertn /* Switch to guest: GSTAT.PGM = 1, ERRCTL.ISERR = 0,
> >>>>> TLBRPRMD.ISTLBR = 0 */
> >>>>> .endm
> >>>>>
> >>>>> @@ -158,9 +159,10 @@ SYM_CODE_START(kvm_exc_entry)
> >>>>> csrwr t0, LOONGARCH_CSR_GTLBC
> >>>>> ld.d tp, a2, KVM_ARCH_HTP
> >>>>> ld.d sp, a2, KVM_ARCH_HSP
> >>>>> + UNWIND_HINT_REGS
> >>>>> +
> >>>>> /* restore per cpu register */
> >>>>> ld.d u0, a2, KVM_ARCH_HPERCPU
> >>>>> - addi.d sp, sp, -PT_SIZE
> >>>>>
> >>>>> /* Prepare handle exception */
> >>>>> or a0, s0, zero
> >>>>> @@ -184,10 +186,11 @@ SYM_CODE_START(kvm_exc_entry)
> >>>>> csrwr s1, KVM_VCPU_KS
> >>>>> kvm_switch_to_guest
> >>>>>
> >>>>> + UNWIND_HINT_UNDEFINED
> >>>>> ret_to_host:
> >>>>> - ld.d a2, a2, KVM_ARCH_HSP
> >>>>> - addi.d a2, a2, -PT_SIZE
> >>>>> - kvm_restore_host_gpr a2
> >>>>> + ld.d sp, a2, KVM_ARCH_HSP
> >>>>> + kvm_restore_host_gpr sp
> >>>>> + addi.d sp, sp, PT_SIZE
> >>>>> jr ra
> >>>>>
> >>>>> SYM_INNER_LABEL(kvm_exc_entry_end, SYM_L_LOCAL)
> >>>>> @@ -200,11 +203,15 @@ SYM_CODE_END(kvm_exc_entry)
> >>>>> * a0: kvm_run* run
> >>>>> * a1: kvm_vcpu* vcpu
> >>>>> */
> >>>>> -SYM_FUNC_START(kvm_enter_guest)
> >>>>> +SYM_CODE_START(kvm_enter_guest)
> >>>>> + UNWIND_HINT_UNDEFINED
> >>>>> /* Allocate space in stack bottom */
> >>>>> - addi.d a2, sp, -PT_SIZE
> >>>>> + addi.d sp, sp, -PT_SIZE
> >>>>> /* Save host GPRs */
> >>>>> - kvm_save_host_gpr a2
> >>>>> + kvm_save_host_gpr sp
> >>>>> + la.pcrel a2, 111f
> >>>>> + st.d a2, sp, PT_ERA
> >>>>> + UNWIND_HINT_REGS
> >>>>>
> >>>> why the label 111f is more accurate? Supposing there is hw
> >>>> breakpoint here and backtrace is called, what is the call trace
> >>>> stack then? obvious label 111f is not executed instead.
> >>> Xianglai said marking it as regs can get more stack infos, so I use
> >>> UNWIND_HINT_REGS marked here, though it not called. Remove
> >>> UNWIND_HINT_REGS thenforbid unwind from here.
> >>> This function is called and should usually be marked as "call",
> >>> but it is complex by switching the stack and use `ertn` calls
> >>> another function.
> >>>
> >>>
> >>>>
> >>>> UNWIND_HINT_REGS is used for nested kernel stack, is that right?
> >>>> With nested interrupt and exception handlers on LoongArch kernel, is
> >>>> UNWIND_HINT_REGS used?
> >>>>
> >>>> SYM_CODE_START(ret_from_fork_asm)
> >>>> UNWIND_HINT_REGS
> >>>> move a1, sp
> >>>> bl ret_from_fork
> >>>> STACKLEAK_ERASE
> >>>> RESTORE_STATIC
> >>>> RESTORE_SOME
> >>>> RESTORE_SP_AND_RET
> >>>> SYM_CODE_END(ret_from_fork_asm)
> >>>> With this piece of code, what is contents of pt_regs? In generic it
> >>>> is called from sys_clone, era is user PC address, is that right? If so,
> >>>> what is detailed usage in the beginning of ret_from_fork_asm?
> >>> The stacktrace shows the control flow where the PC will go back, so
> >>> it is right because when PC is in ret_from_fork_asm, it is already
> >>> another thread. The era means it will go back user mode.
> >> The problem is that user mode era shows unwind with error, and
> >> user_mode(regs) is not accurate. here is piece of code.
> >> pc = regs->csr_era;
> >> if (!__kernel_text_address(pc))
> >> goto err;
> >> will UNWIND_HINT_END_OF_STACK be better than UNWIND_HINT_REGS?
> >
> > You are right. And the reason why current the unwinder does not cause
> > error is in case ORC_TYPE_REGS we process it by user_mode(regs).
> Any process about UNWIND_HINT_REGS usage, is nested exception unwind
> supported now?
>
> Talking without any actions seems not be style of Loongson :)
>From my point of view, Tiezhu's simple solution is acceptable...
Huacai
>
> Regards
> Bibo Mao
> >
> > Jinyang
> >
>