Re: [BUG] riscv: Kernel panic in free_initmem when driver triggers modprobe
From: Vivian Wang
Date: Thu Apr 02 2026 - 14:10:09 EST
On 4/3/26 01:45, Vivian Wang wrote:
> On 4/2/26 19:02, guibing wrote:
>
>> Hi all,
>>
>> I encountered a kernel panic on RISC-V 32-bit (rv32) during the boot
>> process, specifically inside free_initmem(). This happens when a
>> built-in ethernet driver triggers a firmware request that falls back
>> to calling /sbin/modprobe.
>>
>> I have created a minimal dummy driver (trigger_bug) to reproduce this
>> issue reliably on QEMU.
>>
>> Environment
>> Kernel: Linux 6.1.166
>> Architecture: RISC-V 32-bit (rv32)
>> Hardware: QEMU virt machine
>>
>> [...]
>>
>> [ 0.976583] Unable to handle kernel paging request at virtual
>> address c0800000
>> [ 0.978314] Oops [#1]
>> [ 0.978461] Modules linked in:
>> [ 0.978823] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.166 #2
>> [ 0.979240] Hardware name: riscv-virtio,qemu (DT)
>> [ 0.979616] epc : __memset+0x58/0xf4
>> [ 0.979965] ra : free_reserved_area+0x166/0x1c0
>> [ 0.980229] epc : c0792ab0 ra : c016c33e sp : c1c43f30
>> [ 0.980488] gp : c18b8610 tp : c1c28000 t0 : c0800000
>> [ 0.980744] t1 : c0c12000 t2 : c18ba868 s0 : c1c43f50
>> [ 0.981000] s1 : 00000000 a0 : c0800000 a1 : cccccccc
>> [ 0.981248] a2 : 00001000 a3 : c0801000 a4 : 00000000
>> [ 0.981507] a5 : 80c00000 a6 : 00000800 a7 : 000000cc
>> [ 0.981758] s2 : c1121b18 s3 : 00000000 s4 : 00000000
>> [ 0.982009] s5 : 00000000 s6 : 00000000 s7 : 00000000
>> [ 0.982255] s8 : 00000000 s9 : 00000000 s10: 00000000
>> [ 0.982506] s11: 00000000 t3 : 00080400 t4 : c11a5b20
>> [ 0.982763] t5 : 000000ff t6 : c0000000
>> [ 0.982965] status: 00000120 badaddr: c0800000 cause: 0000000f
>> [ 0.983397] [<c0792ab0>] __memset+0x58/0xf4
>> [ 0.983755] [<c0003f42>] free_initmem+0x74/0x82
>> [ 0.983965] [<c079d2ea>] kernel_init+0x3a/0x106
>> [ 0.984208] [<c0003490>] ret_from_exception+0x0/0x16
>> [ 0.985116] ---[ end trace 0000000000000000 ]---
>> [ 0.985683] Kernel panic - not syncing: Attempted to kill init!
>> exitcode=0x0000000b
>> [ 0.986385] ---[ end Kernel panic - not syncing: Attempted to kill
>> init! exitcode=0x0000000b ]---
>>
> I no longer reproduce this on v6.7+. I believe this is fixed by commit
> 05942f780ac6 ("Merge patch series "riscv: Fix set_memory_XX() and
> set_direct_map_XX()""), probably specifically 629db01c64ff ("riscv:
> Don't use PGD entries for the linear mapping"), which is also backported
> to v6.6.36+. The series is at:
>
> https://lore.kernel.org/linux-riscv/20231108075930.7157-1-alexghiti@xxxxxxxxxxxx/
>
> Patch 1 doesn't apply to 6.1, which was probably why it's not
> backported, but you should be able to do just something like:
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 50a1b6edd491..af326aa383bc 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -671,6 +671,13 @@ void __init create_pgd_mapping(pgd_t *pgdp,
>
> static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> {
> + /*
> + * On 32-bit, avoid PGD (i.e. PMD) sized mappings, since we are not
> + * going to propagate PGD level changes.
> + */
> + if (!IS_ENABLED(CONFIG_64BIT))
> + return PAGE_SIZE;
> +
> /* Upgrade to PMD_SIZE mappings whenever possible */
> if ((base & (PMD_SIZE - 1)) || (size & (PMD_SIZE - 1)))
> return PAGE_SIZE;
>
>> Cause Analysis
>>
>> During kernel boot, before free_initmem() is called, if any device
>> driver triggers /sbin/modprobe execution, the current process's
>> mm_struct switches from init_mm to the user-space modprobe process.
>> This causes a page table context switch, updating the satp (page table
>> base address register).
>>
>> The free_initmem() function calls set_memory_rw_nx() to modify page
>> attributes to RW, non-X of the __init_begin section. but,
>> set_memory_rw_nx() operates on init_mm's page tables (swapper_pg_dir).
>> However, the subsequent free_initmem_default memset() that poisons the
>> memory region uses satp that is the current process's mm_struct (the
>> modprobe process) to translate the virtual address.
>>
> I couldn't really immediately figure out what this cause analysis is
> about, but yes, it makes sense that the PGD i.e. PMD level mappings not
> being propagated properly would cause this and causes mapping problems.
Okay, I get what you meant now. I think satp changing here is expected
since kernel_init() runs in the newly spawned PID 1, so this wasn't the
problem. The problem was keeping all the pgdirs in sync, which not using
PGD for the linear map should take care of.