Re: BUG: kernel NULL pointer dereference from check_preempt_wakeup()

From: Peter Zijlstra
Date: Fri Jun 05 2020 - 09:15:36 EST


On Thu, Jun 04, 2020 at 03:54:45PM -0700, Paul E. McKenney wrote:

> BUG: kernel NULL pointer dereference, address: 0000000000000150
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 9 PID: 196 Comm: rcu_torture_rea Not tainted 5.7.0+ #3923
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> RIP: 0010:check_preempt_wakeup+0xb1/0x180
> Code: 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 75 05 48 85 ff 75 29 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 0f 95 c2 48 85 ff 0f 94

That is:

All code
========
0: 83 ea 01 sub $0x1,%edx
3: 48 8b 9b 48 01 00 00 mov 0x148(%rbx),%rbx
a: 39 d0 cmp %edx,%eax
c: 75 f2 jne 0x0
e: 48 39 bb 50 01 00 00 cmp %rdi,0x150(%rbx)
15: 75 05 jne 0x1c
17: 48 85 ff test %rdi,%rdi
1a: 75 29 jne 0x45
1c: 48 8b ad 48 01 00 00 mov 0x148(%rbp),%rbp
23: 48 8b 9b 48 01 00 00 mov 0x148(%rbx),%rbx
2a:* 48 8b bd 50 01 00 00 mov 0x150(%rbp),%rdi <-- trapping instruction
31: 48 39 bb 50 01 00 00 cmp %rdi,0x150(%rbx)
38: 0f 95 c2 setne %dl
3b: 48 85 ff test %rdi,%rdi
3e: 0f .byte 0xf
3f: 94 xchg %eax,%esp

> RSP: 0018:ffffaccdc02ecd38 EFLAGS: 00010006
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffafa0bc20
> RDX: 0000000000000000 RSI: ffff946b5df50000 RDI: ffff946b5f469340
> RBP: 0000000000000000 R08: ffff946b5df80d00 R09: 0000000000000001

And you have RBP == NULL and RBX == NULL

Now, my compiler generates very similar code for this function and tells
me this is:

check_preempt_wakeup()
find_matching_se()
is_same_group()
if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*

and pahole gives us (for struct sched_entity):

struct sched_entity * parent; /* 0x148 0x8 */
struct cfs_rq * cfs_rq; /* 0x150 0x8 */

apparently both your @se and @pse are NULL.

Which shouldn't be possible..

I also don't see a relation to my recent changes here:

> Call Trace:
> <IRQ>
> check_preempt_curr+0x5d/0x90
> ttwu_do_wakeup.isra.93+0xf/0x100
> sched_ttwu_pending+0x66/0x90
> smp_call_function_single_interrupt+0x2d/0xf0
> call_function_single_interrupt+0xf/0x20

Since I would expect that to blow up much earlier, like @p == NULL or
something along those lines.

Could you perhaps try something like the below ? It would splat when we
run ouf of hierarchy to ascend, which is the only semi sane scenario for
ending up where you are.

Are you actively using cgroups or just whatever systemd decides to gift
you? Can you perhaps dump /proc/sched_debug while your test is running?

---

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 35f4cc024dcfc..7aace0a5921e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -416,8 +416,6 @@ static inline struct sched_entity *parent_entity(struct sched_entity *se)
static void
find_matching_se(struct sched_entity **se, struct sched_entity **pse)
{
- int se_depth, pse_depth;
-
/*
* preemption test can be made between sibling entities who are in the
* same cfs_rq i.e who have a common parent. Walk up the hierarchy of
@@ -425,23 +423,22 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse)
* parent.
*/

- /* First walk up until both entities are at same depth */
- se_depth = (*se)->depth;
- pse_depth = (*pse)->depth;
-
- while (se_depth > pse_depth) {
- se_depth--;
- *se = parent_entity(*se);
- }
-
- while (pse_depth > se_depth) {
- pse_depth--;
- *pse = parent_entity(*pse);
- }
-
while (!is_same_group(*se, *pse)) {
- *se = parent_entity(*se);
- *pse = parent_entity(*pse);
+ int se_depth = (*se)->depth;
+ int pse_depth = (*pse)->depth;
+
+ if (se_depth <= pse_depth) {
+ struct sched_entity *parent = parent_entity(*pse);
+ if (WARN_ON_ONCE(!parent))
+ return;
+ *pse = parent;
+ }
+ if (se_depth >= pse_depth) {
+ struct sched_entity *parent = parent_entity(*se);
+ if (WARN_ON_ONCE(!parent))
+ return;
+ *se = parent_entity(*se);
+ }
}
}