Re: [tip:sched/core] [sched] af7f588d8f: WARNING:at_kernel/sched/core.c:#sched_mm_cid_after_execve
From: Borislav Petkov
Date: Mon Jan 02 2023 - 09:05:20 EST
On Fri, Dec 30, 2022 at 08:46:25AM -0500, Mathieu Desnoyers wrote:
> void sched_mm_cid_after_execve(struct task_struct *t)
> {
> struct mm_struct *mm = t->mm;
> unsigned long flags;
>
> WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm);
Yeah, it is that check and it reproduces here trivially in my guest so much so
so that I can't even boot current tip/master in it due to the constant flood
from it.
Also, there's a null ptr deref there:
[ 1.694051] Initialise system trusted keyrings
[ 1.694915] ------------[ cut here ]------------
[ 1.695689] BUG: kernel NULL pointer dereference, address: 000000000000005c
[ 1.695714] #PF: supervisor write access in kernel mode
[ 1.695721] #PF: error_code(0x0002) - not-present page
[ 1.695728] PGD 0 P4D 0
[ 1.695739] Oops: 0002 [#1] PREEMPT SMP
[ 1.695747] CPU: 0 PID: 126 Comm: kworker/u32:1 Not tainted 6.2.0-rc2+ #2
[ 1.695754] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1.695760] RIP: 0010:_raw_spin_lock+0x17/0x30
[ 1.702127] WARNING: CPU: 13 PID: 115 at kernel/sched/core.c:11346 sched_mm_cid_after_execve+0xd5/0xf0
[ 1.699309] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 c8 ea 64 7e 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 00 00 00 0f 1f 80 00
[ 1.702857] Modules linked in:
[ 1.699309] RSP: 0018:ffffc900004afe78 EFLAGS: 00010046
[ 1.703670]
[ 1.699309]
[ 1.704665] CPU: 13 PID: 115 Comm: kworker/u32:0 Not tainted 6.2.0-rc2+ #2
[ 1.699309] RAX: 0000000000000000 RBX: ffff88800d323d00 RCX: 0000000000000000
[ 1.699309] RDX: 0000000000000001 RSI: ffff88800d323d00 RDI: 000000000000005c
[ 1.699309] RBP: 000000000000005c R08: 0000000000000064 R09: ffffc900004afb30
[ 1.699309] R10: 0000000000000000 R11: fffffffffffffffe R12: 0000000000000246
[ 1.699309] R13: 0000000000000000 R14: 00000000fffffffe R15: ffff88800d323d00
[ 1.699309] FS: 0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
[ 1.699309] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.699309] CR2: 000000000000005c CR3: 000000000220a000 CR4: 00000000003506f0
[ 1.699309] Call Trace:
[ 1.699309] <TASK>
[ 1.699309] sched_mm_cid_after_execve+0x52/0xf0
[ 1.706650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1.699309] bprm_execve+0x323/0x600
[ 1.707390] RIP: 0010:sched_mm_cid_after_execve+0xd5/0xf0
[ 1.699309] kernel_execve+0x15f/0x1c0
[ 1.707967] Code: 00 00 74 04 f0 80 0b 02 48 8b 1c 24 48 8b 6c 24 08 4c 8b 64 24 10 4c 8b 6c 24 18 4c 8b 74 24 20 48 83 c4 28 c3 cc cc cc cc 90 <0f> 0b 90 e9 65 ff ff ff 41 be ff ff ff ff eb 9d 66 66 2e 0f 1f 84
[ 1.699309] call_usermodehelper_exec_async+0xd1/0x190
[ 1.708882] RSP: 0018:ffffc90000457e80 EFLAGS: 00010246
[ 1.699309] ? __pfx_call_usermodehelper_exec_async+0x10/0x10
[ 1.709839]
[ 1.699309] ret_from_fork+0x2c/0x50
[ 1.710739] RAX: fffffffffffffffe RBX: ffff88800cad8f40 RCX: 0000000000000000
[ 1.699309] </TASK>
[ 1.714247] RDX: ffffc90000457dc8 RSI: ffff88800cad8f40 RDI: ffff88800cad8f40
[ 1.699309] Modules linked in:
[ 1.715270] RBP: ffff88800dd35400 R08: 0000000000000064 R09: ffffc90000457b30
[ 1.699309] CR2: 000000000000005c
[ 1.699309] ---[ end trace 0000000000000000 ]---
... flood of the above...
> is too strict. AFAIU the usermodehelper thread is a kernel thread, which
> happens to have a non-NULL mm after execve. We want to allow usermodehelper
> threads to use rseq, so I think the appropriate approach here would be to
> just warn if !t->mm:
>
> WARN_ON_ONCE(!t->mm);
You need at least this to avoid the null ptr deref too:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 048ec2417990..5c920c94a6b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11340,10 +11340,13 @@ void sched_mm_cid_before_execve(struct task_struct *t)
void sched_mm_cid_after_execve(struct task_struct *t)
{
- struct mm_struct *mm = t->mm;
+ struct mm_struct *mm;
unsigned long flags;
- WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm);
+ if (WARN_ON_ONCE(!t->mm))
+ return;
+
+ mm = t->mm;
local_irq_save(flags);
t->mm_cid = mm_cid_get(mm);
---
which gives the below.
I'm not sure though how the rules are about those kworker threads and them
having a ->mm...
[ 1.734104] ------------[ cut here ]------------
[ 1.734144] Initialise system trusted keyrings
[ 1.734553] WARNING: CPU: 9 PID: 109 at kernel/sched/core.c:11346 sched_mm_cid_after_execve+0xcb/0xe0
[ 1.752756] workingset: timestamp_bits=61 max_order=19 bucket_order=0
[ 1.754187] Modules linked in:
[ 1.768160] 9p: Installing v9fs 9p2000 file system support
[ 1.768640]
[ 1.768876] Key type asymmetric registered
[ 1.769048] CPU: 9 PID: 109 Comm: kworker/u32:1 Not tainted 6.2.0-rc2+ #9
[ 1.769207] Asymmetric key parser 'x509' registered
[ 1.769397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1.769651] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)
[ 1.769833] RIP: 0010:sched_mm_cid_after_execve+0xcb/0xe0
[ 1.770162] io scheduler mq-deadline registered
[ 1.770462] Code: 00 00 74 04 f0 80 0b 02 48 8b 1c 24 48 8b 6c 24 08 4c 8b 64 24 10 4c 8b 6c 24 18 4c 8b 74 24 20 48 83 c4 28 c3 cc cc cc cc 90 <0f> 0b 90 eb d9 41 be ff ff ff ff eb a0 0f 1f 84 00 00 00 00 00 90
[ 1.810713] RSP: 0018:ffffc90000427e80 EFLAGS: 00010246
[ 1.823527] RAX: fffffffffffffffe RBX: ffff88800cb88000 RCX: 0000000000000000
[ 1.824425] RDX: ffffc90000427dc8 RSI: ffff88800cb88000 RDI: ffff88800cb88000
[ 1.825266] acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
[ 1.825564] RBP: ffff88800d2d8200 R08: 0000000000000064 R09: ffffc90000427b30
[ 1.825914] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[ 1.826068] R10: 0000000000000000 R11: fffffffffffffffe R12: fffffffffffffffe
[ 1.839784] ACPI: button: Power Button [PWRF]
[ 1.840327] R13: 0000000000000000 R14: 00000000fffffffe R15: ffff88800cb88000
[ 1.855532] FS: 0000000000000000(0000) GS:ffff88807dc40000(0000) knlGS:0000000000000000
[ 1.856681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.857403] CR2: 0000000000000000 CR3: 000000000220a000 CR4: 00000000003506e0
[ 1.858264] Call Trace:
[ 1.858643] <TASK>
[ 1.871528] bprm_execve+0x323/0x600
[ 1.872027] kernel_execve+0x15f/0x1c0
[ 1.872505] call_usermodehelper_exec_async+0xd1/0x190
[ 1.873120] ? __pfx_call_usermodehelper_exec_async+0x10/0x10
[ 1.873800] ret_from_fork+0x2c/0x50
[ 1.874259] </TASK>
[ 1.874582] ---[ end trace 0000000000000000 ]---
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette