Re: [PATCH v10 1/2] printk: Make printk() completely async

From: Andrew Morton
Date: Mon Apr 04 2016 - 18:51:54 EST


On Tue, 5 Apr 2016 01:57:27 +0900 Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx> wrote:

> From: Jan Kara <jack@xxxxxxx>
>
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock(). This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
>
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
>
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and wake_up()s a special dedicated kthread to do the
> printing to console. This has the advantage that printing always happens
> from a schedulable contex and thus we don't lockup any particular CPU or
> even interrupts. Also it has the advantage that printk() is fast and
> thus kernel booting is not slowed down by slow serial console.
> Disadvantage of this method is that in case of crash there is higher
> chance that important messages won't appear in console output (we may
> need working scheduling to print message to console). We somewhat
> mitigate this risk by switching printk to the original method of
> immediate printing to console if oops is in progress. Also for
> debugging purposes we provide printk.synchronous kernel parameter which
> resorts to the original printk behavior.
>
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory), thus printk() uses its own dedicated printing
> kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
> set we potentially can receive delays in printing until workqueue
> declares a ->mayday, as noted by Tetsuo Handa).

The whole idea remains worrisome. It is definitely making printk()
less reliable in the vast majority of cases: what happens if the
scheduler is busted or random memory has been scribbled on, etc.

All this downside to handle (afaict) one special case. Surely there is
another way? For example (but feel free to suggest other approaches!)
can we put some limit on the number of extra characters which the
printing task will print? Once that limit is hit, new printk callers
will spin until they can get in and do some printing themselves. Or
something else?

> index 3d28b50..c23a5bd 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3122,6 +3122,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> printk.time= Show timing data prefixed to each printk message line
> Format: <bool> (1/Y/y=enable, 0/N/n=disable)
>
> + printk.synchronous=
> + By default kernel messages are printed to console
> + asynchronously (except during early boot or when oops
> + is happening). That avoids kernel stalling behind slow
> + serial console and thus avoids softlockups, interrupt
> + timeouts, or userspace timing out during heavy printing.
> + However for debugging problems, printing messages to
> + console immediately may be desirable. This option
> + enables such behavior.

Well, it's good that we have this.

It would be better if it was runtime-controllable - changing boot
parameters is a bit of a pain. In fact with this approach, your
zillions-of-scsi-disks scenario becomes less problematic: do the async
offloading during the boot process then switch back to the more
reliable sync printing late in boot.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
> #include <linux/utsname.h>
> #include <linux/ctype.h>
> #include <linux/uio.h>
> +#include <linux/kthread.h>
>
> #include <asm/uaccess.h>
> #include <asm-generic/sections.h>
> @@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
>
> +/*
> + * When true, printing to console will happen synchronously.
> + * The default value on UP systems is 'true'.

That's rather obvious from the code. Comments should explain "why",
not "what".

> + */
> +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +/* Printing kthread for async printk */
> +static struct task_struct *printk_kthread;
> +/* When `true' - printing thread has messages to print */
> +static bool printk_kthread_need_flush_console;
> +
> /* Return log buffer address */
> char *log_buf_addr_get(void)
> {
> @@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> const char *dict, size_t dictlen,
> const char *fmt, va_list args)
> {
> + /* cpu currently holding logbuf_lock in this function */
> + static unsigned int logbuf_cpu = UINT_MAX;
> static bool recursion_bug;
> static char textbuf[LOG_LINE_MAX];
> char *text = textbuf;
> @@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> int this_cpu;
> int printed_len = 0;
> bool in_sched = false;
> - /* cpu currently holding logbuf_lock in this function */
> - static unsigned int logbuf_cpu = UINT_MAX;
> + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
>
> if (level == LOGLEVEL_SCHED) {
> level = LOGLEVEL_DEFAULT;
> @@ -1757,12 +1772,29 @@ asmlinkage int vprintk_emit(int facility, int level,
> if (!in_sched) {
> lockdep_off();
> /*
> - * Try to acquire and then immediately release the console
> - * semaphore. The release will print out buffers and wake up
> - * /dev/kmsg and syslog() users.
> + * By default we print message to console asynchronously so

Nit: this comment down here shouldn't know what the default is. That
should be documented up at the printk_sync definition site.

> + * that kernel doesn't get stalled due to slow serial console.

s/kernel/the kernel/

> + * That can lead to softlockups, lost interrupts, or userspace
> + * timing out under heavy printing load.
> + *
> + * However we resort to synchronous printing of messages during
> + * early boot, when synchronous printing was explicitly
> + * requested by kernel parameter, or when console_verbose() was

s/kernel/a kernel/


> + * called to print everything during panic / oops.

We're missing a description of *why* console_verbose() is handled
specially.

> */
> - if (console_trylock())
> - console_unlock();
> + if (!in_panic && printk_kthread) {

We don't really need local variable in_panic. I guess it has some
documentary value.

> + /* Offload printing to a schedulable context. */
> + printk_kthread_need_flush_console = true;
> + wake_up_process(printk_kthread);
> + } else {
> + /*
> + * Try to acquire and then immediately release the
> + * console semaphore. The release will print out
> + * buffers and wake up /dev/kmsg and syslog() users.
> + */
> + if (console_trylock())
> + console_unlock();
> + }
> lockdep_on();
> }
>
> @@ -2722,6 +2754,47 @@ static int __init printk_late_init(void)
> late_initcall(printk_late_init);
>
> #if defined CONFIG_PRINTK
> +static int printk_kthread_func(void *data)
> +{
> + while (1) {
> + set_current_state(TASK_INTERRUPTIBLE);
> + if (!printk_kthread_need_flush_console)
> + schedule();
> +
> + __set_current_state(TASK_RUNNING);
> + /*
> + * Avoid an infinite loop when console_unlock() cannot
> + * access consoles, e.g. because console_suspended is
> + * true. schedule(), someone else will print the messages
> + * from resume_console().
> + */
> + printk_kthread_need_flush_console = false;
> +
> + console_lock();
> + console_unlock();
> + }
> +
> + return 0;
> +}
> +
> +static int __init init_printk_kthread(void)
> +{
> + struct task_struct *thread;
> +
> + if (printk_sync)
> + return 0;
> +
> + thread = kthread_run(printk_kthread_func, NULL, "printk");

This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
task will block printk for ever. This seems bad.

> + if (IS_ERR(thread)) {
> + pr_err("printk: unable to create printing thread\n");
> + printk_sync = true;
> + } else {
> + printk_kthread = thread;
> + }
> + return 0;
> +}
> +late_initcall(init_printk_kthread);

Could do with a comment explaining why late_initcall was chosen.