Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread

From: Petr Mladek
Date: Fri Jun 30 2017 - 09:16:35 EST


On Fri 2017-06-30 16:01:31, Sergey Senozhatsky wrote:
> we are doing our best in order to avoid lockups caused by console_unlock(),
> but the top priority remains messages print out. If we can't guarantee that
> anything will take over and print the messages, we continue printing from
> the current process, even though it may result in lockups.
>
> this is based on my own experience with the previous "wake_up and forget
> about it" async printk patch set (v12) which we rolled out to our fleet
> of developers' boards;

Well, v12 completely failed when there was a sudden death. Also
printk_kthread slept with console_lock() taken. Therefore it was
much less effective during printk() floods.


> responses we received from the community; and
> somehow it also aligned with the recent Linus' reply
>
> : If those two things aren't the absolutely primary goals, the whole
> : thing is pointless to even discuss. No amount of cool features,
> : performance, or theoretical deadlock avoidance matters ONE WHIT
> : compared to the two things above.
>
> // the two things above were -- messages on the screen and dmesg.

Sure, this sounds cool but things are not black and white.

We have the offload patches in SUSE for years because some big
machines did not boot without them. On the other hand, AFAIK,
we have zero bug reports on losing messages caused by a flood
of messages. To be fair, we often look into crash dumps for
these messages.

I remember that you mentioned loosing messages in several threads.
I wonder if it is caused by different configuration, use case,
or extra patches. Anyway, it might suggest that you use the
printk() system (buffers, throughput) on the edge or even beyond
of its capacity.


Anyway, the handshake during offloading might be pretty
problematic. To be honest, I do not have much experience
with it. I have shared some my fears in the other mail[1].
Jan Kara spent a lot of time on this and probably could
say more.

Maybe, we could try to look into the throotling path. Slowing down
massive printk() callers looks necessary when things gets
out of control.

I wonder if I could add some counter into task_struct.
It might be configurable. I am not sure if people would
want this enabled on production systems where the level
of messages should be lower anyway.

[1] https://lkml.kernel.org/r/20170630115457.GE23069@xxxxxxxxxxxxxxx

Best Regards,
Petr