[PATCH] printk: Avoid softlockup reports during heavy printk traffic

From: Jan Kara
Date: Mon Jan 21 2013 - 14:30:20 EST


A CPU can be caught in console_unlock() for a long time (tens of seconds are
reported by our customers) when other CPUs are using printk heavily and serial
console makes printing slow. This triggers softlockup warnings because
interrupts are disabled for the whole time console_unlock() runs (e.g.
vprintk() calls console_unlock() with interrupts disabled).

It doesn't trigger hardlockup reports because serial drivers already call
touch_nmi_watchdog() but that's not enough in some cases. For example if some
process uses on_each_cpu(), smp_call_function() will block until IPI on the
printing CPU is processed and that doesn't happen for tens of seconds so the
CPU where on_each_cpu() is executing is locked up as well.

Postponing the printing to a worker thread was deemed too risky (see discussion
in the thread starting with https://lkml.org/lkml/2013/1/15/438) so here we
take a less risky approach and just silence all the watchdogs. We have to be
careful not to make them completely useless by touching them too often so we
touch the watchdogs only every 1000 characters we print in one console_unlock()
call. That should really happen only during boot, when printing sysrq-t output,
or in similarly extreme situations.

Signed-off-by: Jan Kara <jack@xxxxxxx>
---
kernel/printk.c | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 357f714..1cbae2b 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -2043,6 +2043,7 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;
bool retry;
+ unsigned int printed = 0;

if (console_suspended) {
up(&console_sem);
@@ -2103,8 +2104,30 @@ skip:

stop_critical_timings(); /* don't trace print latency */
call_console_drivers(level, text, len);
+ printed += len;
start_critical_timings();
local_irq_restore(flags);
+
+ /*
+ * If we already printed too much, touch watchdogs. The limit
+ * is set high enough so that it happens really rarely (and
+ * printk traffic doesn't render lockup watchdogs useless) but
+ * low enough so that even slower serial console doesn't cause
+ * lockup reports. 9600 baud serial console prints 1000 chars
+ * in about 1 second...
+ */
+ if (printed > 1000) {
+ touch_nmi_watchdog();
+ /*
+ * This is needed because other CPUs may be waiting on
+ * this CPU to process IPI so they are stalled as well.
+ */
+ touch_all_softlockup_watchdogs();
+ local_irq_save(flags);
+ rcu_cpu_stall_reset();
+ local_irq_restore(flags);
+ printed = 0;
+ }
}
console_locked = 0;
mutex_release(&console_lock_dep_map, 1, _RET_IP_);
--
1.7.1


--FkmkrVfFsRoUs1wW--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/