AW: Crash in fair scheduler

From: Schmid, Carsten
Date: Thu Dec 05 2019 - 05:56:20 EST

Next message: Gaurav Kohli: "[PATCH v0] irqchip/gic-v3: Avoid check of lpi configuration for non existent cpu"
Previous message: Dave Young: "Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage"
In reply to: Peter Zijlstra: "Re: Crash in fair scheduler"
Next in thread: Davidlohr Bueso: "Re: Crash in fair scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> Von: Peter Zijlstra [mailto:peterz@xxxxxxxxxxxxx]

>
> Exatly.
>
>
> I suppose one approach is to add code to both __enqueue_entity() and
> __dequeue_entity() that compares ->rb_leftmost to the result of
> rb_first(). That'd incur some overhead but it'd double check the logic.

As this is a ONCE without reproducer, i would prefer to use an approach
to exactly check for this case in the code path where it crashed.
Something like this (with pseudo-code):

simple:
....

do {
se = pick_next_entity(..)
if (unlikely(!se)) { /* here we check for the issue */
write warning and some useful data to dmesg
if (cur_rq->rb_leftmost == NULL) { /* our case */
set cur_rq->rb_leftmost to itself as mentioned in the discussion
se = pick_next_entity(..) /* should now return a valid pointer */
} else { /* another case happened, unknown */
write warning to dmesg UNKNOWN
panic() /* not known what to do here, would crash anyway. */
}
set_next_entity(se, ..)
cfs_rq = group_cfs_rq(...)
} while (cfs_rq);

This will definitely not fix the rb_leftmost being NULL, but we can't tell
where this happened at all, so it's digging in the dark.
Maybe the data written to dmesg will help to diagnose further, if the issue
will happen again.
And, this will not affect performance much, as i have to take care of this too.

Thanks for all your suggestions.
Carsten

Next message: Gaurav Kohli: "[PATCH v0] irqchip/gic-v3: Avoid check of lpi configuration for non existent cpu"
Previous message: Dave Young: "Re: [PATCH] x86/efi: update e820 about reserved EFI boot services data to fix kexec breakage"
In reply to: Peter Zijlstra: "Re: Crash in fair scheduler"
Next in thread: Davidlohr Bueso: "Re: Crash in fair scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]