[RFC][PATCHv5 02/13] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Mon Aug 14 2017 - 23:00:08 EST


printk() is quite complex internally and, basically, it does two
slightly independent things:
a) adds a new message to a kernel log buffer (log_store())
b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

1) console_unlock() attempts to flush all pending kernel log messages
to the console and it can loop indefinitely.

2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
etc.

This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
sysfs param, which sets the limit on the number of seconds a given process
can print from console_unlock(). Value 0 corresponds to the current behavior
(no limitation). The printing offloading is happening from console_unlock()
function and, briefly, looks as follows:
as soon as process prints for more than `atomic_print_limit' seconds
it attempts to offload printing to another process.

Since nothing guarantees that there will another process sleeping on the
console_sem or calling printk() on another CPU simultaneously, the patch
also introduces auxiliary kernel thread - printk_kthread, the main
purpose of which is to take over printing duty. The workflow is, thus,
turns into:
as soon as process prints for more than `atomic_print_limit' seconds
it wakes up printk_kthread and unlocks the console_sem.

The wakeup part is also a bit tricky, since scheduler is in position to
decide that printk_kthread should run on the very same CPU with the process
that is currently doing printing. This means that offloading potentially
may never take place. That's why we try to play games with printk_kthread
affinity mask and basically want to wake it up on a foreign CPU, so
printing take over has more chances to succeed.

There are, however, cases when we can't (or should not) offload. For
example, we can't call into the scheduler from panic(), because this
may cause deadlock. Therefore printk() has a new 'emergency mode': in
this mode we never attempt to offload printing to printk_kthread.
There are places, where printk switches to printk_emergency mode
automatically: for instance, once a EMERG log level message appears
in the log buffer; in others - user must explicitly forbid offloading.
For that purpose we provide two new functions:

-- printk_emergency_begin()
Disables printk offloading. All printk() calls (except for deferred
printk) will attempt to lock the console_sem and, if successful,
flush kernel log messages.

-- printk_emergency_end()
Enables printk offloading.

Offloading is not possible yet, it will be enabled in a later patch.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
---
Documentation/admin-guide/kernel-parameters.txt | 10 ++
include/linux/console.h | 3 +
kernel/printk/printk.c | 184 ++++++++++++++++++++++--
3 files changed, 185 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 35a31ea1d709..b5b6d5d24596 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3205,6 +3205,16 @@
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)

+ printk.atomic_print_limit=
+ Specify the number of seconds a particular task
+ can spend printing messages to the console before
+ it offloads its printing duty to a special printk
+ kthread. Normally the limit should not be larger
+ than watchdog lockup detection threshold. Setting
+ this parameter to 0 disables printing offloading.
+
+ default: off.
+
processor.max_cstate= [HW,ACPI]
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.
diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..07005db4c788 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
extern void suspend_console(void);
extern void resume_console(void);

+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
int mda_console_init(void);
void prom_con_init(void);

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8777003d292e..a86767d4d619 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
+#include <linux/kthread.h>

#include <linux/uaccess.h>
#include <asm/sections.h>
@@ -400,15 +401,16 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
printk_safe_exit_irqrestore(flags); \
} while (0)

-#ifdef CONFIG_PRINTK
/*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
*/
#define PRINTK_PENDING_WAKEUP 0x01
#define PRINTK_PENDING_OUTPUT 0x02

static unsigned long printk_pending;

+#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq;
@@ -444,6 +446,127 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;

+static struct task_struct *printk_kthread;
+static cpumask_var_t printk_offload_cpus;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool printk_enforce_emergency __read_mostly;
+/*
+ * The number of seconds a task can print before it offloads printing
+ * duty to printk_kthread. 0 indicates 'no limit'.
+ */
+static unsigned int atomic_print_limit;
+
+module_param_named(atomic_print_limit, atomic_print_limit, uint, 0644);
+MODULE_PARM_DESC(atomic_print_limit,
+ "the number of seconds task is allowed to print messages");
+
+static inline bool printk_offloading_enabled(void)
+{
+ return atomic_print_limit &&
+ printk_enforce_emergency == 0 &&
+ printk_kthread &&
+ atomic_read(&printk_emergency) == 0;
+}
+
+static inline bool is_printk_offloading_safe(void)
+{
+ return system_state == SYSTEM_RUNNING && !oops_in_progress;
+}
+
+static inline bool printk_kthread_should_stop(void)
+{
+ if (current != printk_kthread)
+ return false;
+ /* An emergency mode */
+ return (atomic_read(&printk_emergency) != 0);
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not wait for printk_kthread to stop (if it's
+ * already printing logbuf messages).
+ */
+void printk_emergency_begin(void)
+{
+ atomic_inc(&printk_emergency);
+}
+EXPORT_SYMBOL_GPL(printk_emergency_begin);
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+ atomic_dec(&printk_emergency);
+}
+EXPORT_SYMBOL_GPL(printk_emergency_end);
+
+/*
+ * Under heavy printing load or with a slow serial console (or both)
+ * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
+ * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
+ * amount of time a process can print from console_unlock().
+ *
+ * This function must be called from 'printk_safe' context under
+ * console_sem lock.
+ */
+static inline bool console_offload_printing(void)
+{
+ static struct task_struct *printing_task;
+ static unsigned long printing_start_ts;
+ unsigned long now = local_clock() >> 30LL; /* seconds */
+
+ if (printk_kthread_should_stop())
+ return true;
+
+ if (!printk_offloading_enabled())
+ return false;
+
+ if (!is_printk_offloading_safe())
+ return false;
+
+ /* A new task - reset the counters. */
+ if (printing_task != current) {
+ printing_start_ts = local_clock() >> 30LL;
+ printing_task = current;
+ return false;
+ }
+
+ /* Once we offloaded to printk_ktread - keep printing */
+ if (current == printk_kthread)
+ return false;
+
+ if (!time_after_eq(now, printing_start_ts + atomic_print_limit))
+ return false;
+
+ /*
+ * We try to set `printk_kthread' CPU affinity to any online CPU
+ * except for this_cpu. Because otherwise `printk_kthread' may be
+ * scheduled on the same CPU and offloading will not take place.
+ */
+ cpumask_copy(printk_offload_cpus, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), printk_offload_cpus);
+
+ /*
+ * If this_cpu is the one and only online CPU, then try to wake up
+ * `printk_kthread' on it.
+ */
+ if (cpumask_empty(printk_offload_cpus))
+ cpumask_set_cpu(smp_processor_id(), printk_offload_cpus);
+
+ set_cpus_allowed_ptr(printk_kthread, printk_offload_cpus);
+ wake_up_process(printk_kthread);
+ return true;
+}
+
/* Return log buffer address */
char *log_buf_addr_get(void)
{
@@ -1763,6 +1886,15 @@ asmlinkage int vprintk_emit(int facility, int level,

printed_len = log_output(facility, level, lflags, dict, dictlen, text, text_len);

+ /*
+ * Emergency level indicates that the system is unstable and, thus,
+ * we better stop relying on wake_up(printk_kthread) and try to do
+ * a direct printing.
+ */
+ if (level == LOGLEVEL_EMERG)
+ printk_enforce_emergency = true;
+
+ set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
logbuf_unlock_irqrestore(flags);

/* If called from the scheduler, we can not call up(). */
@@ -1880,6 +2012,14 @@ static size_t msg_print_text(const struct printk_log *msg,
bool syslog, char *buf, size_t size) { return 0; }
static bool suppress_message_printing(int level) { return false; }

+void printk_emergency_begin(void) {}
+EXPORT_SYMBOL_GPL(printk_emergency_begin);
+
+void printk_emergency_end(void) {}
+EXPORT_SYMBOL_GPL(printk_emergency_end);
+
+static inline bool console_offload_printing(void) { return false; }
+
#endif /* CONFIG_PRINTK */

#ifdef CONFIG_EARLY_PRINTK
@@ -2161,8 +2301,17 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;
bool do_cond_resched, retry;
+ bool did_offload;

if (console_suspended) {
+ /*
+ * Here and later, we need to clear the PENDING_OUTPUT bit
+ * in order to avoid an infinite loop in printk_kthread
+ * function when console_unlock() cannot flush messages
+ * because we suspended consoles. Someone else will print
+ * the messages from resume_console().
+ */
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
up_console_sem();
return;
}
@@ -2191,6 +2340,7 @@ void console_unlock(void)
* console.
*/
if (!can_use_console()) {
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
console_locked = 0;
up_console_sem();
return;
@@ -2202,6 +2352,9 @@ void console_unlock(void)
size_t len;

printk_safe_enter_irqsave(flags);
+ /* Must be called under printk_safe */
+ did_offload = console_offload_printing();
+
raw_spin_lock(&logbuf_lock);
if (seen_seq != log_next_seq) {
wake_klogd = true;
@@ -2219,7 +2372,7 @@ void console_unlock(void)
len = 0;
}
skip:
- if (console_seq == log_next_seq)
+ if (did_offload || console_seq == log_next_seq)
break;

msg = log_from_idx(console_idx);
@@ -2267,14 +2420,18 @@ void console_unlock(void)
up_console_sem();

/*
- * Someone could have filled up the buffer again, so re-check if there's
- * something to flush. In case we cannot trylock the console_sem again,
- * there's a new owner and the console_unlock() from them will do the
- * flush, no worries.
+ * Someone could have filled up the buffer again, so re-check
+ * if there's something to flush. In case when trylock fails,
+ * there's a new owner and the console_unlock() from them will
+ * do the flush, no worries.
*/
- raw_spin_lock(&logbuf_lock);
- retry = console_seq != log_next_seq;
- raw_spin_unlock(&logbuf_lock);
+ if (!did_offload) {
+ raw_spin_lock(&logbuf_lock);
+ retry = console_seq != log_next_seq;
+ if (!retry)
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
+ raw_spin_unlock(&logbuf_lock);
+ }
printk_safe_exit_irqrestore(flags);

if (retry && console_trylock())
@@ -2698,8 +2855,11 @@ late_initcall(printk_late_init);
#if defined CONFIG_PRINTK
static void wake_up_klogd_work_func(struct irq_work *irq_work)
{
- if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
- /* If trylock fails, someone else is doing the printing */
+ if (test_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
+ /*
+ * If trylock fails, someone else is doing the printing.
+ * PRINTK_PENDING_OUTPUT bit is cleared by console_unlock().
+ */
if (console_trylock())
console_unlock();
}
--
2.14.1