Re: [PATCH] printk: stop spining waiter when console resume to flush prb

From: Petr Mladek
Date: Mon May 10 2021 - 05:50:39 EST


On Mon 2021-05-10 16:29:11, luojiaxing wrote:
>
> On 2021/5/8 0:13, Petr Mladek wrote:
> > On Thu 2021-05-06 23:07:19, Sergey Senozhatsky wrote:
> > > Thanks for Cc-ing Petr
> > >
> > > On (21/05/06 15:39), Petr Mladek wrote:
> > > > Many printk messages might get accumulated when consoles were suspended.
> > > > They are proceed when console_unlock() is called in resume_console().
> > > >
> > > > The possibility to pass the console lock owner was added to reduce the risk
> > > > of softlockup when too many messages were handled in an atomic context.
> > > >
> > > > Now, resume_console() is always in a preemptible context that is safe
> > > > to handle all accumulated messages. The possibility to pass the console
> > > > lock owner actually makes things worse. The new owner might be in an atomic
> > > > context and might cause softlockup when processing all messages accumulated
> > > > when the console was suspended.
> > > >
> > > > Create new console_unlock_preemptible() that will not allow to pass
> > > > the console lock owner. As a result, all accumulated messages will
> > > > be proceed in the safe preemptible process.
> > > If we have a lot of pending messages in the logbuf, then there is
> > > something chatty - some context (task, irq) or maybe several contexts.
> > > And those contexts can continue adding messages, while we print them
> > > _exclusively_ from preemptible context only. without ever throttling down
> > > printk() callers - something that console_owner spinning and handover
> > > does for us. And those printk() callers can even preempt
> > > console_unlock_preemptible() and cause delays and lost messages.
> > Luo, please, correct me if I am wrong.
>
>
> Hi, Petr, I reply the test result on your first reply for this patch.
>
> Please check it when you are free.
>
>
> >
> > This patch a about one well defined scenario. The messages are
> > accumulated between suspend_console() and resume_console().
> > It is "small" part of the system hibernation. And we need
> > to get them out now. There might be many if something special
> > was debugged.
> >
> > I am pretty sure that Luo did not see any flood of messages:
> >
> > + Flood in more contexts would be balanced by switching
> > the console_owner.
> >
> > + Flood in one context would be naturally throttled because
> > this context will become the console_owner.
> >
> > In each case, these messages would be generated after
> > console_resume(). Luo's original patch was explicitly talking
> > about messages accumulated during the suspend.
> >
> > Luo, could you please provide some log showing the problem?
>
>
> Sure, But, it is not easy to find that the printk got such problem in the
> user's logs.
>
> So I'm attaching a log of the simulation test. You can clearly see that the
> driver thread calling dev_info() is blocked.
>
>
> My test method is as follows:
> Kernel thread A causes the console to enter suspend and then resume it 10
> seconds later.
> Kernel thread B repeatedly invokes dev_info() for 15 seconds after the
> console suspend.
>
>
> Part of dmesg log I save before:
>
> [  288.013869] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013870] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013871] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013872] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013873] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013874] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013875] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013876] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  288.013877] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  289.670256] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  299.286325] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  299.291198] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  299.296063] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  299.300924] hisi_sas_v3_hw 0000:b4:02.0: print test
> [  299.305787] hisi_sas_v3_hw 0000:b4:02.0: print test
>
> The log shows that when thread B invokes dev_info, dev_info is blocked for
> several seconds.

This works as expected. It is not ideal but it is the best we
can do with the current printk() design. kthread B causes flood
of messages and it pays the price for it. The console work
is a nature throttling.

The only problem is that it becomes the console_owner too late.
It spends long time by flushing the accumulated messages. But
they are its own messages after all.

The only real solution is to pass the console work to a separate
kthread and do not block any printk() caller.


By other words. The above test case does not justify the need
of the patch.

The only justification for the patch would be when you see many
messages accumulated between suspend_console() and resume_console()
in the real life.

Best Regards,
Petr