[RFC][PATCHv3 2/5] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Tue May 09 2017 - 04:31:44 EST


printk() is quite complex internally and, basically, it does two
slightly independent things:
a) adds a new message to a kernel log buffer (log_store())
b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

1) console_unlock() attempts to flush all pending kernel log messages
to the console and it can loop indefinitely.

2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
etc.

This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
sysfs param, which sets the limit on number of lines a process can print
from console_unlock(). Value 0 corresponds to the current behavior (no
limitation). The printing offloading is happening from console_unlock()
function and, briefly, looks as follows: as soon as process prints more
than `atomic_print_limit' lines it attempts to offload printing to another
process. Since nothing guarantees that there will another process sleeping
on the console_sem or calling printk() on another CPU simultaneously, the
patch also introduces an auxiliary kernel thread - printk_kthread, the
main purpose of which is to take over printing duty. The workflow is, thus,
turns into: as soon as process prints more than `atomic_print_limit' lines
it wakes up printk_kthread and unlocks the console_sem. So in the best case
at this point there will be at least 1 processes trying to lock the
console_sem: printk_kthread. (There can also be a process that was sleeping
on the console_sem and that was woken up by console semaphore up(); and
concurrent printk() invocations from other CPUs). But in the worst case
there won't be any processes ready to take over the printing duty: it
may take printk_kthread some time to become running; or printk_kthread
may even never become running (a misbehaving scheduler, or other critical
condition). That's why after we wake_up() printk_kthread we can't
immediately leave the printing loop, we must ensure that the console_sem
has a new owner before we do so. Therefore, `atomic_print_limit' is a soft
limit, not the hard one: we let task to overrun `atomic_print_limit'. But,
at the same time, the console_unlock() printing loop behaves differently
for tasks that have exceeded `atomic_print_limit': after every printed
logbuf entry (call_console_drivers()) such a process wakes up printk_kthread,
unlocks the console_sem and attempts to console_trylock() a bit later (if
there any are pending messages in the logbuf, of course). In the best case
scenario either printk_kthread or some other tasks will lock the console_sem,
so current printing task will see failed console_trylock(), which will
indicate a successful printing offloading. In the worst case, however,
current will successfully console_trylock(), which will indicate that
offloading did not take place and we can't return from console_unlock(),
so the printing task will print one more line from the logbuf and attempt
to offload printing once again; and it will continue doing so until another
process locks the console_sem or until there are pending messages in the
logbuf. So if everything goes wrong - we can't wakeup printk_kthread and
there are no other processes sleeping on the console_sem or trying to down()
it - then we will have the existing console_unlock() behavior: print all
pending messages in one shot.

We track the number of unsuccessful offloading attempts and after some
point we declare a `printk_emergency' condition and give up trying to
offload. `printk_emergency' is a new name for printk mode in which printk
does not attempt to offload printing, but instead flushes all the pending
logbuf messages (basically, the existing behaviour).

There also other cases when we can't (or should avoid) offloading. For
example, we can't call into the scheduler from panic(), because this may
cause deadlock. Therefore printk() has some additional places where it
switches to printk_emergency mode: for instance, once a EMERG log level
message appears in the log buffer. We also provide two new functions
that can be used when a path needs to declare a temporal
`printk_emergency' mode:

-- printk_emergency_begin()
Disables printk offloading. All printk() calls will attempt
to lock the console_sem and, if successful, flush kernel log
messages.

-- printk_emergency_end()
Enables printk offloading.

Offloading is not possible yet, it will be enabled in a later patch.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
---
include/linux/console.h | 3 +
kernel/printk/printk.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 198 insertions(+), 8 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..07005db4c788 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
extern void suspend_console(void);
extern void resume_console(void);

+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
int mda_console_init(void);
void prom_con_init(void);

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2cb7f4753b76..a113f684066c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
+#include <linux/kthread.h>

#include <linux/uaccess.h>
#include <asm/sections.h>
@@ -402,7 +403,8 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
} while (0)

/*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
*/
#define PRINTK_PENDING_WAKEUP 0x01
#define PRINTK_PENDING_OUTPUT 0x02
@@ -445,6 +447,53 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;

+static struct task_struct *printk_kthread __read_mostly;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool printk_enforce_emergency __read_mostly;
+/*
+ * The number of lines a task can print before offloading printing
+ * job to printk_kthread. 0 indicates 'no limit'.
+ */
+static unsigned int atomic_print_limit;
+
+module_param_named(atomic_print_limit, atomic_print_limit, uint, 0644);
+MODULE_PARM_DESC(atomic_print_limit,
+ "max lines to print before offloading to printk kthread");
+
+static inline bool printk_offloading_enabled(void)
+{
+ return atomic_print_limit &&
+ printk_kthread &&
+ !printk_enforce_emergency &&
+ atomic_read(&printk_emergency) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not stop the printk_kthread if it's already
+ * printing logbuf messages.
+ */
+void printk_emergency_begin(void)
+{
+ atomic_inc(&printk_emergency);
+}
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+ atomic_dec(&printk_emergency);
+}
+
/* Return log buffer address */
char *log_buf_addr_get(void)
{
@@ -1765,6 +1814,15 @@ asmlinkage int vprintk_emit(int facility, int level,

printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);

+ /*
+ * Emergency level indicates that the system is unstable and, thus,
+ * we better stop relying on wake_up(printk_kthread) and try to do
+ * a direct printing.
+ */
+ if (level == LOGLEVEL_EMERG)
+ printk_enforce_emergency = true;
+
+ set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
logbuf_unlock_irqrestore(flags);

/* If called from the scheduler, we can not call up(). */
@@ -1882,6 +1940,9 @@ static size_t msg_print_text(const struct printk_log *msg,
bool syslog, char *buf, size_t size) { return 0; }
static bool suppress_message_printing(int level) { return false; }

+void printk_emergency_begin(void) {}
+void printk_emergency_end(void) {}
+
#endif /* CONFIG_PRINTK */

#ifdef CONFIG_EARLY_PRINTK
@@ -2155,6 +2216,85 @@ static inline int can_use_console(void)
return cpu_online(raw_smp_processor_id()) || have_callable_console();
}

+/*
+ * Under heavy printing load or with a slow serial console (or both)
+ * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
+ * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
+ * number of lines a process can print from console_unlock().
+ *
+ * There is one more reason to do offloading - there might be other processes
+ * (including user space tasks) sleeping on the console_sem in uninterruptible
+ * state; and keeping the console_sem locked for long time may have a negative
+ * impact on them.
+ *
+ * This function must be called from 'printk_safe' context under
+ * console_sem lock.
+ */
+static inline bool console_offload_printing(void)
+{
+ static struct task_struct *printing_task;
+ static unsigned long long lines_printed;
+ static unsigned long saved_csw;
+
+ if (!printk_offloading_enabled())
+ return false;
+
+ if (system_state != SYSTEM_RUNNING || oops_in_progress)
+ return false;
+
+ /* A new task - reset the counters. */
+ if (printing_task != current) {
+ lines_printed = 0;
+ printing_task = current;
+ saved_csw = current->nvcsw + current->nivcsw;
+ return false;
+ }
+
+ lines_printed++;
+ if (lines_printed < atomic_print_limit)
+ return false;
+
+ if (current == printk_kthread) {
+ /*
+ * Reset the `lines_printed' counter, just in case if
+ * printk_kthread is the only process left that would
+ * down() console_sem and call console_unlock().
+ */
+ lines_printed = 0;
+ return true;
+ }
+
+ /*
+ * A trivial emergency enforcement - give up on printk_kthread if
+ * we can't wake it up. This assumes that `atomic_print_limit' is
+ * reasonably and sufficiently large.
+ */
+ if (lines_printed > 10 * (unsigned long long)atomic_print_limit &&
+ saved_csw == (current->nvcsw + current->nivcsw)) {
+ printk_enforce_emergency = true;
+ pr_crit("Declaring printk emergency mode.\n");
+ return false;
+ }
+
+ /*
+ * Wakeup `printk_kthread' but don't reset the `lines_printed'
+ * counter, let task overrun the limit. The idea is that
+ *
+ * a) woken up printk_kthread (if succeeded)
+ * or
+ * b) concurrent printk from another CPU or a task from console_sem
+ * wait list (if any)
+ *
+ * will take over, change `printing_task' and reset the counter.
+ * If neither a) nor b) happens - we continue printing from
+ * current process. Which is bad and can be risky, but since
+ * we couldn't wake_up() printk_kthread, the things already
+ * don't look normal.
+ */
+ wake_up_process(printk_kthread);
+ return true;
+}
+
/**
* console_unlock - unlock the console system
*
@@ -2177,8 +2317,17 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;
bool do_cond_resched, retry;
+ bool did_offload;

if (console_suspended) {
+ /*
+ * Here and later, we need to clear the PENDING_OUTPUT bit
+ * in order to avoid an infinite loop in printk_kthread
+ * function when console_unlock() cannot flush messages
+ * because we suspended consoles. Someone else will print
+ * the messages from resume_console().
+ */
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
up_console_sem();
return;
}
@@ -2200,6 +2349,7 @@ void console_unlock(void)
do_cond_resched = console_may_schedule;
again:
console_may_schedule = 0;
+ did_offload = 0;

/*
* We released the console_sem lock, so we need to recheck if
@@ -2207,6 +2357,7 @@ void console_unlock(void)
* console.
*/
if (!can_use_console()) {
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
console_locked = 0;
up_console_sem();
return;
@@ -2235,7 +2386,7 @@ void console_unlock(void)
len = 0;
}
skip:
- if (console_seq == log_next_seq)
+ if (did_offload || console_seq == log_next_seq)
break;

msg = log_from_idx(console_idx);
@@ -2267,9 +2418,38 @@ void console_unlock(void)
stop_critical_timings(); /* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
start_critical_timings();
+
+ /*
+ * Sometimes current task may re-acquire console_sem ahead of
+ * just woken up printk_kthread, because we let processes to
+ * compete for console_sem. In this case current task will jump
+ * to `again' label (if there are any pending messages), print
+ * one more line, break out of printing loop (we don't reset
+ * `lines_printed' after wake_up_process() exactly for this
+ * purpose) and do up_console_sem(); this will be repeated as
+ * long as current task successfully console_trylock()-s.
+ *
+ * Eventually and hopefully, either printk_kthread or some
+ * other task will acquire console_sem and current process will
+ * be able to leave console_unlock(). Until this happens we
+ * cannot be sure that anyone will take over and flush the
+ * `logbuf' messages. So in the worst case scenario we,
+ * basically, have old printk()/console_unlock() behaviour:
+ * a task prints all pending logbuf messages.
+ *
+ * If printk_kthread never wakes up (which may indicate that
+ * the system is unstable or something weird is going on),
+ * then we keep jumping to `again' label printing one message
+ * from the logbuf each time. This is a bit ugly, but at least
+ * we will print out the logbuf. If such condition occurs,
+ * console_offload_printing() should declare `printk_emergency'
+ * at some point and we will stop all attempts to offload
+ * printing.
+ */
+ did_offload = console_offload_printing();
printk_safe_exit_irqrestore(flags);

- if (do_cond_resched)
+ if (!did_offload && do_cond_resched)
cond_resched();
}
console_locked = 0;
@@ -2284,12 +2464,16 @@ void console_unlock(void)

/*
* Someone could have filled up the buffer again, so re-check if there's
- * something to flush. In case we cannot trylock the console_sem again,
- * there's a new owner and the console_unlock() from them will do the
- * flush, no worries.
+ * something to flush. Try to lock the console_sem even if we
+ * `did_offload', because we need to know for sure that some other task
+ * has acquired the console_sem. In case we cannot trylock the
+ * console_sem again, there's a new owner and the console_unlock() from
+ * them will do the flush, no worries.
*/
raw_spin_lock(&logbuf_lock);
retry = console_seq != log_next_seq;
+ if (!retry)
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
raw_spin_unlock(&logbuf_lock);
printk_safe_exit_irqrestore(flags);

@@ -2719,8 +2903,11 @@ late_initcall(printk_late_init);
#if defined CONFIG_PRINTK
static void wake_up_klogd_work_func(struct irq_work *irq_work)
{
- if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
- /* If trylock fails, someone else is doing the printing */
+ if (test_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
+ /*
+ * If trylock fails, someone else is doing the printing.
+ * PRINTK_PENDING_OUTPUT bit is cleared by console_unlock().
+ */
if (console_trylock())
console_unlock();
}
--
2.12.2