Re: [PATCH v2 3/4] printk: Avoid livelock with heavy printk during panic

From: Stephen Brennan
Date: Thu Jan 27 2022 - 11:19:52 EST


On 1/27/22 06:50, Petr Mladek wrote:
On Wed 2022-01-26 15:02:35, Stephen Brennan wrote:
During panic(), if another CPU is writing heavily the kernel log (e.g.
via /dev/kmsg), then the panic CPU may livelock writing out its messages
to the console. Note when too many messages are dropped during panic and
suppress further printk, except from the panic CPU. This could result in
some important messages being dropped. However, messages are already
being dropped, so this approach at least prevents a livelock.

Signed-off-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx>
---

Notes:
v2: Add pr_warn when we suppress printk on non-panic CPU

kernel/printk/printk.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 20b4b71a1a07..18107db118d4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -93,6 +93,12 @@ EXPORT_SYMBOL_GPL(console_drivers);
*/
int __read_mostly suppress_printk;
+/*
+ * During panic, heavy printk by other CPUs can delay the
+ * panic and risk deadlock on console resources.
+ */
+int __read_mostly suppress_panic_printk;
+
#ifdef CONFIG_LOCKDEP
static struct lockdep_map console_lock_dep_map = {
.name = "console_lock"
@@ -2228,6 +2234,10 @@ asmlinkage int vprintk_emit(int facility, int level,
if (unlikely(suppress_printk))
return 0;
+ if (unlikely(suppress_panic_printk) &&
+ atomic_read(&panic_cpu) != raw_smp_processor_id())
+ return 0;
+
if (level == LOGLEVEL_SCHED) {
level = LOGLEVEL_DEFAULT;
in_sched = true;
@@ -2613,6 +2623,7 @@ void console_unlock(void)
{
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[CONSOLE_LOG_MAX];
+ static int panic_console_dropped;
unsigned long flags;
bool do_cond_resched, retry;
struct printk_info info;
@@ -2667,6 +2678,11 @@ void console_unlock(void)
if (console_seq != r.info->seq) {
console_dropped += r.info->seq - console_seq;
console_seq = r.info->seq;
+ if (panic_in_progress() && panic_console_dropped++ > 10) {
+ suppress_panic_printk = 1;
+ pr_warn("Too many dropped messages. "
+ "Supress messages on non-panic CPUs to prevent livelock.\n");

It looks like the message might be printed more times when
panic_console_dropped++ > 10.

In theory, no message can be lost after we disable printk on another
CPUs. But the code might evolve in the future. Let's make it
more error-proof.

We could use (panic_console_dropped++ == 10) or pr_warn_once() or
both.

I prefer using pr_warn_once() because it looks the most error-proof.


Nit: printk() has exceptions from the 80 chars/line rule.
The message string should be on a single line. It helps
to find it using "git grep". I think that even checkpatch.pl
handles this correctly.


With pr_warn_once() and message in single line:

Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>

Best Regards,
Petr


PS: I could fix the two problems when pushing to git. But there
is still time to send v3. I have vacation the following week
with limited internet access. I am not going to rush it into
linux before I leave, ...

Thank you Petr, I will go ahead and resolve things in this patch, and the others in a v3 soon. Regardless, no need to rush, please enjoy your vacation!

Thanks for all the review and guidance!
Stephen