Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.

From: Ben Greear
Date: Fri Dec 06 2024 - 19:50:21 EST


On 12/6/24 16:15, Suren Baghdasaryan wrote:
On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:

On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:

On 12/6/24 14:03, Suren Baghdasaryan wrote:
On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:

Hello Suren,

My system crashes on bootup, and I bisected to this commit.

0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
commit 0f9b685626daa2f8e19a9788625c9b624c223e45
Author: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Date: Wed Oct 23 10:07:57 2024 -0700

alloc_tag: populate memory for module tags as needed

The memory reserved for module tags does not need to be backed by physical
pages until there are tags to store there. Change the way we reserve this
memory to allocate only virtual area for the tags and populate it with
physical pages as needed when we load a module.

The crash looks like this:

BUG: unable to handle page fault for address: fffffbfff4041000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
RIP: 0010:kasan_check_range+0xa5/0x190
Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
[ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
- Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
dev Devices.
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
[ OK ? __die+0x1f/0x60
0m] Reached targ ? page_fault_oops+0x258/0x910
et sysi ? dump_pagetable+0x690/0x690
nit.target - ? search_bpf_extables+0x22/0x250
System Initiali ? trace_page_fault_kernel+0x120/0x120
zation.
? search_bpf_extables+0x164/0x250
? kasan_check_range+0xa5/0x190
? fixup_exception+0x4d/0xc70
? exc_page_fault+0xe1/0xf0
[ OK ? asm_exc_page_fault+0x22/0x30
0m] Reached targ ? load_module+0x3d7b/0x7560
et netw ? kasan_check_range+0xa5/0x190
ork.target - __asan_memcpy+0x38/0x60
Network.
load_module+0x3d7b/0x7560
? module_frob_arch_sections+0x30/0x30
? lockdep_lock+0xbe/0x1b0
? rw_verify_area+0x18d/0x5e0
? kernel_read_file+0x246/0x870
? __x64_sys_fspick+0x290/0x290
? init_module_from_file+0xd1/0x130
init_module_from_file+0xd1/0x130
? __ia32_sys_init_module+0xa0/0xa0
? lock_acquire+0x2d/0xb0
? idempotent_init_module+0x116/0x790
? do_raw_spin_unlock+0x54/0x220
idempotent_init_module+0x226/0x790
? init_module_from_file+0x130/0x130
? vm_mmap_pgoff+0x203/0x2e0
__x64_sys_finit_module+0xba/0x130
do_syscall_64+0x69/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fe869de327d
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
</TASK>
Modules linked in:
CR2: fffffbfff4041000
---[ end trace 0000000000000000 ]---

I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
is found here:

http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config

I will be happy to test fixes.

Hi Ben,
Thanks for reporting the issue. Do you have these recent fixes in your tree:

https://lore.kernel.org/all/20241130001423.1114965-1-surenb@xxxxxxxxxx/
https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@xxxxxxxxx/

If not, couple you please apply them and see if the issue is still happening?
Thanks,
Suren.

Hello Suren,

Thanks for the quick response. The first patch is already in latest Linus tree,

Hmm. Could you please double-check which tree you are using? I don't
see the first patch
(https://lore.kernel.org/all/20241130001423.1114965-1-surenb@xxxxxxxxxx/)
in Linus' tree. Maybe you are using linux-next?

Sorry, you are correct. I must have mangled something when trying to apply
the patch and I didn't look hard enough when patch said changes were already applied.

I can re-test this next week...and for reference, kernel boots fine when you disable
KASAN and other debugging.

Thanks,
Ben


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com