Re: printk meeting at LPC

From: Daniel Vetter
Date: Fri Sep 13 2019 - 10:49:02 EST

On Fri, Sep 13, 2019 at 3:26 PM John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
> On 2019-09-09, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > printk meeting at LPC Meeting Room - SAFIRA on Tuesday Sept 10. from
> > 2PM to 3PM.
> The meeting was very effective in letting us come to decisions on the
> direction to take. Thanks for the outstanding attendance! It certainly
> saved hundreds of hours of reading/writing emails!
> The slides[0] from my printk talk served as a _rough_ basis for the
> discussion. Here is a summary of the decisions:
> 1. As a new ringbuffer, the lockless state-based proof of concept
> posted[1] by Petr Mladek will be used. Since it has far fewer memory
> barriers in the code, it will be simpler to review. I posted[2] a patch
> to hack my RFCv4 into a fully functional version of Petr's PoC. So we
> know it will work. With this, printk() can be called from any context
> and the message will be put directly into the ringbuffer.
> 2. A kernel thread will be created for each registered console, each
> responsible for being the sole printers to their respective
> consoles. With this, console printing is _fully_ decoupled from printk()
> callers.

Is the plan to split the console_lock up into a per-console thing? Or
postponed for later on?

> 3. Rather than defining emergency _messages_, we define an emergency
> _state_ where the kernel wants to flush the messages immediately before
> dying. Unlike oops_in_progress, this state will not be visible to
> anything outside of the printk infrastructure.
> 4. When in emergency state, the kernel will use a new console callback
> write_atomic() to flush the messages in whatever context the CPU is in
> at that moment. Only consoles that implement the NMI-safe write_atomic()
> will be able to flush in this state.
> 5. LOG_CONT message pieces will be stored as individual records in the
> ringbuffer. They will be "assembled" by the ringbuffer reader (in
> kernel) before being copied to userspace or printed on the
> console. Since each record in the ringbuffer has its own sequence
> number, this has the effect for userspace that sequence numbers will
> appear to be skipped. (i.e. if there were LOG_CONT pieces with sequence
> numbers 4, 5, 6, the fully assembled message will appear only as
> sequence number 6 (and will have the timestamp from the first piece)).
> 6. A new may-sleep function pr_flush() will be made available to wait
> for all previously printk'd messages to be output on all consoles before
> proceeding. For example:
> pr_cont("Running test ABC... ");
> pr_flush();
> do_test();
> pr_cont("PASSED\n");
> pr_flush();

Just crossed my mind: Could/should we lockdep-annotate pr_flush (take
a lockdep map in there that we also take around the calls down into
console drivers in each of the console printing kthreads or something
like that)? Just to avoid too many surprises when people call pr_flush
from within gpu drivers and wonder why it doesn't work so well.
Although with this nice plan we'll take the modeset paths fully out of
the printk paths (even for normal outputs) I hope, so should be a lot
more reasonable.

> 7. The ringbuffer raw data (log_buf) will be simplified to only consist
> of alignment-padded strings separated by a single unsigned long. All
> record meta-data (timestamp, loglevel, caller_id, etc.) will move into
> the record descriptors, which are located in an extra array. The
> appropriate crash tools will need to be adjusted for this. (FYI: The
> unsigned long in the string data is the descriptor ID.)
> 8. A CPU-reentrant spinlock (the so-called cpu-lock) will be used to
> synchronize/stop the kthreads during emergency state.
> 9. Support for printk dictionaries will be discontinued. I will look
> into who is using this and why. If printk dictionaries are important for
> you, speak up now!
> (There was also some talk about possibly discontinuing kdb, but that is
> not directly related to printk. I'm mentioning it here in case anyone
> wants to pursue that.)
> If I missed (or misunderstood) anything, please let me know!

>From gpu perspective this all sounds extremely good and first
realistic plan that might lead us to an actually working bsod on
linux. But we'll make it pink w/ yellow text or something like that
ofc :-)

Thanks, Daniel

> John Ogness
> [0]
> [1]
> [2]

Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 -