Re: [RFC][PATCH 6/7] printk: use alternative printk buffers

From: Petr Mladek
Date: Wed Oct 05 2016 - 05:50:19 EST


On Wed 2016-10-05 10:27:14, Sergey Senozhatsky wrote:
> On (10/04/16 16:52), Petr Mladek wrote:
> > >
> > > Or is there any other catch that I do not see at the moment?
> >
> > And there is :-( The above logic looked at the problem only from
> > one side. It was about errors starting from the printk()
> > code itself, for example:
>
> yes, like I said - printk recursion and printk deadlock are different
> things. and recursion cases are a subset of deadlock cases.

I see.

> > The only thing that might help here is to call
> > alt_printk_enter()/exit() in wake_up_process() itself. Otherwise,
> > we still would need to keep the printk_deferred() stuff.
>
> yes.
> or
> - combine alt_printk and DEFERRED_WARN/etc.

The question is if alt_printk brings any win after all, see below.


> or
> - rewrite printk() to be lock-less by default (for all invocations).

We already have it and it is called trace_printk(). But it is very
tricky and have some limitations. For example, it does not support
random parallel readers. Also printing the trace log is noticeably
slow.


> > By other words, we might need to put alt_printk_enter()/exit()
> > into the scheduler and timekeeping code. In theory it might
> > be easier to maintain than the separated printk_deferred() calls.
> > But there might be some catches because we need to disable
> > the interrupts, ...
>
> right. and I have some doubts that people will be willing to put
> alt_printk_enter/exit into those hot paths.

I have the same doubts.


> > Sigh, this 2nd scenario is much more likely than the 1st one.
> > I guess that warnings in the scheduler/timekeeping code
> > will be triggered outside printk() most of the time.
>
> hm. may be. but the reports we received so far starts from printk()
> and end up in printk() - IOW, recursion.

My statement might have been too strong. Well, my thinking was
the following:

I am not aware of any real life bug reports caused by recursion
inside locbuf_lock garded section. I guess that it is because
the most sensitive one is guarded by that logbuf_cpu check.

I am not aware of any generic recursions inside the console code.
In fact, it is not easily possible because we console_trylock().
It means that we do not call console if it is already being handled.

We are basically down to the recursion/deadlock caused by the
wake_up_process() call. And there are much more such calls outside
printk().

In fact, I am aware only about one report. It was related to the
async printk patchset, the added wake_up_process(), and used
RT scheduler. This one started from printk() almost by definition.


> > It means that this approach might be much harder to sell
> > after all :-(
>
> well, it solves a number of problems that the existing implementation
> cannot handle.

Please, provide a summary. I wonder if these are real life problems.

Note that we need to put aside all problems that are solvable
with printk_deferred(). It seems that printk_deferred() will
need to stay because it avoids the deadlock caused by
scheduler/timekeeping code locks. By other words, if there
is a missing printk_deferred() we need to put it there
anyway because the same code might get first called
outside printk().

Or do I miss something?

Best Regards,
Petr