[RFC PATCH v1 00/25] printk: new implementation

From: John Ogness
Date: Tue Feb 12 2019 - 09:30:55 EST


Hello,

As probably many of you are aware, the current printk implementation
has some issues. This series (against 5.0-rc6) makes some fundamental
changes in an attempt to address these issues. The particular issues I
am referring to:

1. The printk buffer is protected by a global raw spinlock for readers
and writers. This restricts the contexts that are allowed to
access the buffer.

2. Because of #1, NMI and recursive contexts are handled by deferring
logging/printing to a spinlock-safe context. This means that
messages will not be visible if (for example) the kernel dies in
NMI context and the irq_work mechanism does not survive.

3. Because of #1, when *not* using features such as PREEMPT_RT, large
latencies exist when printing to slow consoles.

4. Because of #1, when _using_ features such as PREEMPT_RT, printing
to the consoles is further restricted to contexts that can sleep.
This can cause large latencies in seeing the messages.

5. Printing to consoles is the responsibility of the printk caller
and that caller may be required to print many messages that other
printk callers inserted. Because of this there can be enormous
variance in the runtime of a printk call.

6. The timestamps on the printk messages are generated when the
message is inserted into the buffer. But the call to printk may
have occurred long before that. This is especially true for
messages in the printk_safe context. This means that printk timing
information (although neatly "sorted") is neither accurate nor
reliable.

7. Loglevel INFO is handled the same as ERR. There seems to be an
endless effort to get printk to show _all_ messages as quickly as
possible in case of a panic (i.e. printing from any context), but
at the same time try not to have printk be too intrusive for the
callers. These are conflicting requirements that lead to a printk
implementation that does a sub-optimal job of satisfying both
sides.

To address these issues this series makes the following changes:

- Implements a new printk ringbuffer that supports lockless multiple
readers. Writers are synchronized per-cpu with support for all
contexts (including NMI). (This implementation was inspired by a
post[0] from Peter Zijlstra.)

- The new printk ringbuffer uses the initialized data segment of the
kernel for its data buffer so that it is available on early boot.

- Timestamps are captured at the beginning of the printk call.

- A dedicated kernel thread is created for printing to all consoles in
a fully preemptible context.

- A new (optional) console operation "write_atomic" is introduced that
console drivers may implement. This function must be NMI-safe. An
implementation for the 8250 UART driver is provided.

- The concept of "emergency messages" is introduced that allows
important messages (based on a new emergency loglevel threshold) to
be immediately written to any consoles supporting write_atomic,
regardless of the context. This allows non-emergency printk calls
(i.e. INFO) to run in nearly constant time, with their console
printing taking place in a separate fully preemptible context. And
emergency messages (i.e. ERR) are printed immediately for the user.

- Individual emergency messages are written contiguously and a CPU-ID
field is added to all output to allow for sorting of messages being
printed by multiple CPUs simultaneously.

Although the RFC series works, there are some open issues that I chose
to leave open until I received some initial feedback from the community.
These issues are:

- The behavior of LOG_CONT has been changed. Rather than using the
current task as the "cont owner", the CPU ID is used. NMIs have
their own cont buffer so NMI and non-NMI tasks can safely use
LOG_CONT.

- The runtime resizing features of the printk buffer are not
implemented.

- The exported vmcore symbols relating to the printk buffer no longer
exist and no replacements have been defined. I do not know all the
userspace consequences of making a change here.

- The introduction of the CPU-ID field not only changes the usual
printk output, but also defines a new field in the extended console
output. I do not know the userspace consequences of making a change
here.

- console_flush_on_panic() currently is a NOP. It is pretty clear how
this could be implemented if atomic_write was available. But if no
such console is registered, it is not clear what should be done. Is
this function really even needed?

- Right now the emergency messages are set apart from the
non-emergency messages using '\n'. There have been requests that
some special markers could be specifiable to make it easier for
parsers. Possibly as CONFIG_ and boot options?

- Be aware that printk output is no longer time-sorted. Actually, it
never was, but now you see the real timestamps. This seems strange
at first.

- The ringbuffer API is not very pretty. It grew to be what it is
due to the varying requirements of the different aspects of printk
(syslog, kmsg_dump, /dev/kmsg, console) and the complexity of
handling lockless reading, which can fall behind at any moment.

- Memory barriers are not my specialty. A critical eye on their
usage (or lack thereof) in the ringbuffer code would be greatly
appreciated.

The first 7 patches introduce the new printk ringbuffer. The
remaining 18 go through and replace the various components of the
printk implementation. All patches are against 5.0-rc6 and each
yield a buildable/testable system.

John Ogness

[0] http://lkml.kernel.org/r/20181017140044.GK3121@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

John Ogness (25):
printk-rb: add printk ring buffer documentation
printk-rb: add prb locking functions
printk-rb: define ring buffer struct and initializer
printk-rb: add writer interface
printk-rb: add basic non-blocking reading interface
printk-rb: add blocking reader support
printk-rb: add functionality required by printk
printk: add ring buffer and kthread
printk: remove exclusive console hack
printk: redirect emit/store to new ringbuffer
printk_safe: remove printk safe code
printk: minimize console locking implementation
printk: track seq per console
printk: do boot_delay_msec inside printk_delay
printk: print history for new consoles
printk: implement CON_PRINTBUFFER
printk: add processor number to output
console: add write_atomic interface
printk: introduce emergency messages
serial: 8250: implement write_atomic
printk: implement KERN_CONT
printk: implement /dev/kmsg
printk: implement syslog
printk: implement kmsg_dump
printk: remove unused code

Documentation/printk-ringbuffer.txt | 377 +++++++
drivers/tty/serial/8250/8250.h | 4 +
drivers/tty/serial/8250/8250_core.c | 19 +-
drivers/tty/serial/8250/8250_dma.c | 5 +-
drivers/tty/serial/8250/8250_port.c | 154 ++-
fs/proc/kmsg.c | 4 +-
include/linux/console.h | 6 +
include/linux/hardirq.h | 2 -
include/linux/kmsg_dump.h | 6 +-
include/linux/printk.h | 30 +-
include/linux/printk_ringbuffer.h | 114 +++
include/linux/serial_8250.h | 5 +
init/main.c | 1 -
kernel/kexec_core.c | 1 -
kernel/panic.c | 3 -
kernel/printk/Makefile | 1 -
kernel/printk/internal.h | 79 --
kernel/printk/printk.c | 1895 +++++++++++++++++------------------
kernel/printk/printk_safe.c | 427 --------
kernel/trace/trace.c | 2 -
lib/Kconfig.debug | 17 +
lib/Makefile | 2 +-
lib/bust_spinlocks.c | 3 +-
lib/nmi_backtrace.c | 6 -
lib/printk_ringbuffer.c | 583 +++++++++++
25 files changed, 2137 insertions(+), 1609 deletions(-)
create mode 100644 Documentation/printk-ringbuffer.txt
create mode 100644 include/linux/printk_ringbuffer.h
delete mode 100644 kernel/printk/internal.h
delete mode 100644 kernel/printk/printk_safe.c
create mode 100644 lib/printk_ringbuffer.c

--
2.11.0