Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread

From: Petr Mladek
Date: Fri Mar 24 2017 - 10:43:33 EST


On Fri 2017-03-24 10:59:36, Sergey Senozhatsky wrote:
> On (03/23/17 09:51), Peter Zijlstra wrote:
> [..]
> > > > sysrq runs from interrupt context, right? Should be able to do wakeups.
> > >
> > > what I though about was -
> > > what if there are 'misbehaving' higher prio tasks all the time?
> > > the existing sysrq would attempt to do printing from irq context
> > > so it doesn't care about run queues.
> > >
> > > does it make sense to you?
> >
> > Ah, that's what you meant. Yeah, dunno, I'm still unconvinced about the
> > whole printk thread thing.
>
> I see your point.
> but I can't think of alternatives that would fix all those lockups and
> stalls and at the same time have better guarantees than printk_kthread.
>
>
> > Also those function names are horrifically long.
>
> right. not happy with the naming either.
>
> so what I'm thinking about right now is:
>
> we have that thing which we call "old printk" mode, which is not
> really informative. and my proposal is rename "old" mode and use
> "printk rescue" mode instead. because we switch to that mode when
> we are trying to "rescue" kernel logs. so the API can be something
> like
> printk_rescue_on()
> printk_rescue_off()

Sounds good to me. Slight problem is that off() does not cause
stopping the mode if we are nested.

Just one more attempt inspired by this:

printk_emergency_begin()
printk_emergency_end()

Note that we actually start this mode automatically also
with pr_emerg() message.

But I am fine with whatever from the mentioned generic names.

>
> --- random thoughts ---
>
> another thing that bothers me a bit is that we need to place those
> printk_rescue_on/printk_rescue_off switches all over the kernel.
> sort of a root cause [in some of the cases] here is the fact that
> we don't have any feedback from printk_kthread in vprintk_emit():
> does printk_kthread make any progress?
> do we flush messages to the serial console?
> etc.
>
> and we've got everything we need to have such a feedback in
> vprintk_emit():
>
> a) console is not suspended so console_unlock() can call console drivers
> b) printk_kthread != NULL
> c) we are not in enforced rescue/emergency mode
> d) `log_next_seq' moves forward (always `true', we are in vprintk_emit())
> e) `console_seq' stands still
>
> so we can have an automatic rescue mode fallback in vprintk_emit().
> if (a)-(e) are true then we give up on waking up printk_kthread,
> switch to rescue mode and attempt to console_trylock() directly from
> vprintk_emit(). the part that sucks here is that we need to give
> printk_kthread some time to catch up. for instance, if (e) is true
> for the past 50 invocations of vprintk_emit(), IOW:
>
> - we added 50 lines to printk
> - none have been printed on the serial console
>
> then we
> - declare rescue
> - do console_trylock() instead of wake_up() //unless in deferred vprintk_emit()

I am not sure if we are able to distinguish a flood of messages
from a real emergency situation.

If we start flushing messages directly when there is a flood
of messages, we will put back the original problem with soft
lookups.

Well, there is a handful of annotated locations at the moment.
I would start thinking of an automatic detection once we have
more of them and have more data for a good heuristic.

I still would like to see the kernel parameter/sysfs knob
that would allow to force the rescue/emergency mode all
the time ;-)

Best Regards,
Petr