Re: [BUG RT] dump-capture kernel not executed for panic in interrupt context
From: peterz
Date: Mon Sep 07 2020 - 07:43:48 EST
On Sat, Aug 22, 2020 at 07:49:28PM -0400, Steven Rostedt wrote:
> From this email:
>
> > The problem happens when that owner is the idle task, this can happen
> > when the irq/softirq hits the idle task, in that case the contending
> > mutex_lock() will try and PI boost the idle task, and that is a big
> > no-no.
>
> What's wrong with priority boosting the idle task? It's not obvious,
> and I can't find comments in the code saying it would be bad.
> The idle task is not mentioned at all in rtmutex.c and not mentioned in
> kernel/locking except for some comments about RCU in lockdep.
There used to be a giant BUG() and comment somewhere in the PI code I
think.. but that's vage memories.
> I see that in the idle code the prio_change method does a BUG(), but
> there's no comment to say why it does so.
>
> The commit that added that BUG, doesn't explain why it can't happen:
>
> a8941d7ec8167 ("sched: Simplify the idle scheduling class")
That's like almost a decade ago ...
> I may have once known the rationale behind all this, but it's been a
> long time since I worked on the PI code, and it's out of my cache.
I suffer much the same problem.
So cenceptually there's the problem that idle must always be runnable,
and the moment you boost it, it becomes subject to a different
scheduling class.
Imagine for example what happens when we boost it to RT and then have it
be subject to throttling. What are we going to run when the idle task
is no longer elegible to run.
(it might all work out by accident, but ISTR we had a whole bunch of fun
in the earlier days of RT due to things like that)