Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
From: Steven Rostedt
Date: Tue Jan 23 2018 - 11:13:40 EST
On Tue, 23 Jan 2018 07:43:47 -0800
Tejun Heo <tj@xxxxxxxxxx> wrote:
> So, at least in the case that we were seeing, it isn't that black and
> white. printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress. The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.
>
Does it?
>From what I understand is that there's an issue with one of the printk
consoles, due to memory pressure or whatnot. Then a printk happens
within a printk recursively. It gets put into the safe buffer and an
irq is sent to printk this printk.
The issue you are saying is that when the printk enables interrupts,
the irq work triggers and loads the log buffer with the safe buffer, and
then the printk sees the new data added and continues to print, and
hence never leaves this printk.
Your solution is to delay the flushing of the safe buffer to another
thread (work queue), which I also have issues with, because you break
the "get printks out ASAP mantra". Then the work queue comes in and
flushes the printks. And since the printks cause printks, we continue
to spam the machine, but hey, we are making forward progress.
Again, this is treating the symptom and not solving the problem.
I really hate delaying printks to another thread, unless we can
guarantee that that thread is ready to go immediately (basically
spinning on a run queue waiting to print). Because if the system is
having issues (which is the main reason for printks to happen), there's
no guarantee that a work queue or another thread will ever schedule,
and the safe printk buffer never gets out to the consoles.
I much rather have throttling when recursive printks are detected.
Make it a 100 lines to print if you want, but then throttle. Because
once you have 100 lines or so, you will know that printks are causing
printks, and you don't give a crap about the repeated process. Allow
one flushing of the printk safe buffers, and then if it happens again,
throttle it.
Both methods can lose important data. I believe the throttling of
recursive printks, after 100 prints or whatever, will be the least
likely to lose important data, because printks caused by printks will
just keep repeating the same data, and we don't care about repeats. But
delaying the flushing could very well lose important data that caused
a lockup.
-- Steve