Re: [PATCH] printk: stop spining waiter when console resume to flush prb

From: luojiaxing
Date: Mon May 10 2021 - 04:29:24 EST



On 2021/5/8 0:13, Petr Mladek wrote:
On Thu 2021-05-06 23:07:19, Sergey Senozhatsky wrote:
Thanks for Cc-ing Petr

On (21/05/06 15:39), Petr Mladek wrote:
Many printk messages might get accumulated when consoles were suspended.
They are proceed when console_unlock() is called in resume_console().

The possibility to pass the console lock owner was added to reduce the risk
of softlockup when too many messages were handled in an atomic context.

Now, resume_console() is always in a preemptible context that is safe
to handle all accumulated messages. The possibility to pass the console
lock owner actually makes things worse. The new owner might be in an atomic
context and might cause softlockup when processing all messages accumulated
when the console was suspended.

Create new console_unlock_preemptible() that will not allow to pass
the console lock owner. As a result, all accumulated messages will
be proceed in the safe preemptible process.
If we have a lot of pending messages in the logbuf, then there is
something chatty - some context (task, irq) or maybe several contexts.
And those contexts can continue adding messages, while we print them
_exclusively_ from preemptible context only. without ever throttling down
printk() callers - something that console_owner spinning and handover
does for us. And those printk() callers can even preempt
console_unlock_preemptible() and cause delays and lost messages.
Luo, please, correct me if I am wrong.


Hi, Petr, I reply the test result on your first reply for this patch.

Please check it when you are free.



This patch a about one well defined scenario. The messages are
accumulated between suspend_console() and resume_console().
It is "small" part of the system hibernation. And we need
to get them out now. There might be many if something special
was debugged.

I am pretty sure that Luo did not see any flood of messages:

+ Flood in more contexts would be balanced by switching
the console_owner.

+ Flood in one context would be naturally throttled because
this context will become the console_owner.

In each case, these messages would be generated after
console_resume(). Luo's original patch was explicitly talking
about messages accumulated during the suspend.

Luo, could you please provide some log showing the problem?


Sure, But, it is not easy to find that the printk got such problem in the user's logs.

So I'm attaching a log of the simulation test. You can clearly see that the driver thread calling dev_info() is blocked.


My test method is as follows:
Kernel thread A causes the console to enter suspend and then resume it 10 seconds later.
Kernel thread B repeatedly invokes dev_info() for 15 seconds after the console suspend.


Part of dmesg log I save before:

[  288.013869] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013870] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013871] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013872] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013873] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013874] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013875] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013876] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013877] hisi_sas_v3_hw 0000:b4:02.0: print test
[  289.670256] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.286325] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.291198] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.296063] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.300924] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.305787] hisi_sas_v3_hw 0000:b4:02.0: print test

The log shows that when thread B invokes dev_info, dev_info is blocked for several seconds.


Thanks

Jiaxing



Best Regards,
Petr

.