Re: [PATCH] riscv: entry: Fixup do_trap_break from kernel side

From: Vivian Wang

Date: Mon Jun 22 2026 - 06:25:58 EST

On 6/22/26 16:28, Peter Zijlstra wrote:
> On Sun, Jun 21, 2026 at 02:52:46AM -0400, Guo Ren wrote:
>> On Fri, Jun 19, 2026 at 04:54:53PM -0700, Kees Cook wrote:
>>> *thread encromancy*
>>>
>>> On Sat, Jul 01, 2023 at 10:57:07PM -0400, guoren@xxxxxxxxxx wrote:
>>>> From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
>>>>
>>>> The irqentry_nmi_enter/exit would force the current context into in_interrupt.
>>>> That would trigger the kernel to dead panic, but the kdb still needs "ebreak" to
>>>> debug the kernel.
>>>>
>>>> Move irqentry_nmi_enter/exit to exception_enter/exit could correct handle_break
>>>> of the kernel side.
>>>>
>>>> Before the fixup:
>>>> $echo BUG > /sys/kernel/debug/provoke-crash/DIRECT
>>>> lkdtm: Performing direct entry BUG
>>>> ------------[ cut here ]------------
>>>> kernel BUG at drivers/misc/lkdtm/bugs.c:78!
>>>> [...]
>>>> Kernel panic - not syncing: Aiee, killing interrupt handler!
>>> This appears to still be unfixed. What's the blocker? The solutions in
>>> this thread seem to work...
>>>
>>> I'd like to be exercising an Oops path via KUnit (for KCFI), and riscv
>>> just instantly falls over instead of thread-killing on the exception.
>> Thanks for reviving this thread. At the time I didn’t fully understand
>> Peter’s point. We should only use the NMI path when the trap occurs with
>> interrupts disabled.
>> Here’s the updated fix:
>>
>> do_trap_break(struct pt_regs *regs)
>> ...
>> irqentry_exit_to_user_mode(regs);
>> } else {
>> - irqentry_state_t state = irqentry_nmi_enter(regs);
>> + if (regs->status & SR_IE) {
>> + enum ctx_state prev_state = exception_enter();
>>
>> - handle_break(regs);
>> + handle_break(regs);
>>
>> - irqentry_nmi_exit(regs, state);
>> + exception_exit(prev_state);
>> + } else {
>> + irqentry_state_t state = irqentry_nmi_enter(regs);
>> +
>> + handle_break(regs);
>> +
>> + irqentry_nmi_exit(regs, state);
>> + }
>> }
>> }
>>
>> If you & Peter have no objection, I’ll post a v2.
> I still don't understand it. This cannot fix anything. Consider:
>
> EBREAK
> raw_spin_lock_irq(&your_lock)
> EBREAK
>
> So now the first 'works', but the second will crash. Additionally,
> having the EBREAK context differ so dramatically between invocations
> seems like a very bad deal to me.

To spell it out, the problem that needs fixing is:

-> BUG()
-> ebreak instruction
-> Breakpoint exception
-> do_trap_break()
-> irqentry_nmi_enter()
[ now in_nmi() / in_interrupt() ]
-> report_bug() returns BUG_TRAP_TYPE_BUG
-> die()
-> make_task_dead()
-> panic() because we're in_interrupt()

As such, currently on riscv all BUG() simply completely panic() the
entire machine, rather than just killing the one task.

How do you think this should be fixed? Here are some ideas but I'm not
familiar with generic entry stuff:

* Should we irqentry_nmi_exit() before calling die() for BUG()?
* Should we move the GENERIC_BUG trap instruction to cause illegal
instruction exception instead, for which we can write a simpler
handler that doesn't need to care about the probe stuff?

Vivian "dramforever" Wang