Re: [RFC PATCH] printk: Introduce "store now but print later" prefix.

From: Sergey Senozhatsky
Date: Mon Mar 04 2019 - 07:09:12 EST


On (03/04/19 20:40), Tetsuo Handa wrote:
> On 2019/03/04 12:22, Sergey Senozhatsky wrote:
> > On (02/23/19 13:42), Tetsuo Handa wrote:
> > [..]
> >> This patch tries to address "don't lockup the system" with minimal risk of
> >> failing to "print out printk() messages", by allowing printk() callers to
> >> tell printk() "store $body_text_lines lines into logbuf but start actual
> >> printing after $trailer_text_line line is stored into logbuf". This patch
> >> is different from existing printk_deferred(), for printk_deferred() is
> >> intended for scheduler/timekeeping use only. Moreover, what this patch
> >> wants to do is "do not try to print out printk() messages as soon as
> >> possible", for accumulated stalling period cannot be decreased if
> >> printk_deferred() from e.g. dump_tasks() from out_of_memory() immediately
> >> prints out the messages. The point of this patch is to defer the stalling
> >> duration to after leaving the critical section.
> >
> > We can export printk deferred, I guess; but I'm not sure if it's going
> > to be easy to switch OOM to printk_deferred - there are lots of direct
> > printk callers: warn-s, dump_stacks, etc; it might even be simpler to
> > start re-directing OOM printouts to printk_safe buffer.
>
> I confirmed that printk_deferred() is not suitable for this purpose, for
> it suddenly stalls for seconds at random locations flushing pending output
> accumulated by printk_deferred().

You are right. printk_deferred() is usually bad news. It may kill the
system, it doesn't care that much. If there is no other printk() caller
to hand off printing to then we can stuck in console_unlock() printing
loop from IRQ context. printk_safe() is, basically, same thing.

> Stalling inside critical section (e.g. RCU read lock held) is what I
> don't like.

I see. Yes, we might hold off grace periods when RCU read side
section is getting interrupted and then we stuck in printing
loop from IRQ.

> dump_task() is the OOM critical section from RCU perspective.
> We can minimize RCU critical section by just getting a refcount on possible
> candidates and then printing information and putting that refcount after
> leaving RCU critical section.

Can do, I guess.

[..]
> > Note, logbuf size is limited - 2G. Might be not as large as people
> > would want it to be.
>
> Are "machines which want to use 2GB logbuf" hosting millions of threads such
> that even 2GB is not enough for holding SysRq-t output? If yes, then I guess
> that tasklist traversal under RCU read lock would lockup even without printk().

640K^W... 2G is probably enough.

-ss