[PATCH] printk: print KERN_CONT buffer in sync printk mode

From: Sergey Senozhatsky
Date: Wed Jun 29 2016 - 01:10:03 EST


One of my boxen has revealed that there is a race condition now between
pr_cot() output and printk_kthread wake_up() time. Schematically, the CPU
that triggered the race conditions was doing
something like this:

for (...) {
for (....) {
...
pr_cont(...);
...
}
pr_cont("\n");
}

the resulting output was:

$ dmesg

00000001 00000002 00000003 00000004 00000005 00000006 00000007
00000008
00000009 0000000a 0000000b 0000000c 0000000d 0000000e
0000000f 00000010 00000011 00000012 00000013 00000014 00000015
0000002b 0000002c 0000002d 0000002e 0000002f 00000030 00000031
....


KERN_CONT output is heavily relying on the fact there will be no other
CPU(-s)/process(-es) doing any sort (KERN_CONT etc.) of printk-s simultaneously.
This basically means that pr_cont() CPU is expected to be the one that actually
does the printing and log_store() of the entire cont buffer:

pr_cont()
console_unlock()
console_cont_flush()
log_store() /* cont buffer */
pr_cont()
console_unlock()
console_cont_flush()
log_store() /* cont buffer */
....

Async printk breaks this contract, because console_cont_flush() is now deferred.
Thus pr_cont() will see a not fully flushed cont buffer, which will force it to
append a new KERN_CONT not to cont buffer, but to buffer log as a separate log
entry via log_store(). Due to the deferred nature of console_cont_flush() the
very next pr_cont() may see cont buffer being flushed, so it will keep the
message in the cont buffer and log_store() it later. So the cont line will split
across several log entries -- printk_kthread can take some time to wkeup,
during which pr_cont() will keep splitting cont line:

$ dmesg

00000001 00000002 00000003 00000004 00000005 00000006 00000007
00000008
00000009
0000000a
0000000b
0000000c
0000000d
0000000e
....

To fix the issue we need to switch to sync printk mode for KERN_CONT messages
and when we cont_flush() the cont buffer.


CPU0 CPU1

vprintk_emit(KERN_CONT "00000001")
if (!(lflags & LOG_NEWLINE))
cont_add()
if (!cont.len)
starts a new cont line
wake_up() printk_kthread
...

vprintk_emit(KERN_CONT "0000004") entering console_unlock()
if (!(lflags & LOG_NEWLINE))
cont_add()
wake_up() printk_kthread

vprintk_emit(KERN_CONT "0000005")
if (!(lflags & LOG_NEWLINE))
cont_add()
wake_up() printk_kthread console_cont_flush()
cont_print_text()

vprintk_emit(KERN_CONT "0000006")
if (!(lflags & LOG_NEWLINE))
cont_add()
wake_up() printk_kthread
leaving console_unlock()

vprintk_emit(KERN_CONT "0000007")
if (!(lflags & LOG_NEWLINE)) entering console_unlock()
cont_add()
wake_up() printk_kthread console_cont_flush()
cont_print_text()
leaving console_unlock()
vprintk_emit(KERN_CONT "\n")
if (lflags & LOG_NEWLINE)
if (cont.len) {
cont_add()
cont_flush()
if (cont.cons) {
cont.flags = flags;
cont.flushed = true;
^^^^^^^^^^^^^^^^^^^^
}
log_store() /* "0000001 .. 0000007" */
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
}
wake_up() process leaving console_unlock()

vprintk_emit(KERN_CONT "0000008")
if (!(lflags & LOG_NEWLINE)) entering console_unlock()
cont_add()
{
if (cont.len && cont.flushed)
^^^^^^^^^^^ not flushed yet!
return false;
...
}

log_store() /* "0000008" */
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ !
wake_up() printk_kthread
console_cont_flush()
^^^^^^^^^^^ now flushed!
cont_print_text()
call_console_drivers('\n');

vprintk_emit(KERN_CONT "0000009")
if (!(lflags & LOG_NEWLINE))
cont_add()
wake_up() printk_kthread

msg_print_text()
call_console_drivers("0000008")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
leaving console_unlock()
...

vprintk_emit(KERN_CONT "\n")
if (lflags & LOG_NEWLINE)
log_store() /* "00000009 ... 0000000e" */
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ !

and so on.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
---
kernel/printk/printk.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index eeafd73..bbb4180 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1705,6 +1705,8 @@ asmlinkage int vprintk_emit(int facility, int level,
int printed_len = 0;
int nmi_message_lost;
bool in_sched = false;
+ /* do not print the KERN_CONT buffer asynchronously */
+ bool has_cont_msg = false;

if (level == LOGLEVEL_SCHED) {
level = LOGLEVEL_DEFAULT;
@@ -1804,6 +1806,8 @@ asmlinkage int vprintk_emit(int facility, int level,
lflags |= LOG_PREFIX|LOG_NEWLINE;

if (!(lflags & LOG_NEWLINE)) {
+ has_cont_msg = true;
+
/*
* Flush the conflicting buffer. An earlier newline was missing,
* or another task also prints continuation lines.
@@ -1834,6 +1838,7 @@ asmlinkage int vprintk_emit(int facility, int level,
stored = cont_add(facility, level, text,
text_len);
cont_flush(LOG_NEWLINE);
+ has_cont_msg = true;
}

if (stored)
@@ -1859,8 +1864,9 @@ asmlinkage int vprintk_emit(int facility, int level,
*
* However we resort to synchronous printing of messages during
* early boot, when synchronous printing was explicitly
- * requested by a kernel parameter, or when console_verbose()
- * was called to print everything during panic / oops.
+ * requested by a kernel parameter, when we have a KERN_CONT
+ * buffer to print, or when console_verbose() was called to
+ * print everything during panic / oops.
* Unlike bust_spinlocks() and oops_in_progress,
* console_verbose() sets console_loglevel to MOTORMOUTH and
* never clears it, while oops_in_progress can go back to 0,
@@ -1868,7 +1874,8 @@ asmlinkage int vprintk_emit(int facility, int level,
* operate in sync mode once panic() occurred.
*/
if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH &&
- can_printk_async()) {
+ can_printk_async() &&
+ !has_cont_msg) {
/* Offload printing to a schedulable context. */
printk_kthread_need_flush_console = true;
wake_up_process(printk_kthread);
--
2.9.0.rc1