Re: [RFC][PATCH 6/7] printk: use alternative printk buffers

From: Petr Mladek
Date: Thu Oct 06 2016 - 07:33:14 EST

On Thu 2016-10-06 13:22:48, Sergey Senozhatsky wrote:
> On (10/05/16 11:50), Petr Mladek wrote:
> [..]
> > > well, it solves a number of problems that the existing implementation
> > > cannot handle.
> >
> > Please, provide a summary. I wonder if these are real life problems.
> 1) some pathces/reports from Byungchul Park
> 2) a report from Viresh Kumar.
> 4) sleeping function called from inside logbuf lock
> 5) ARM specific
> 6) logbuf_lock corruption

It is great that you have such a list in hands. It might help
to push this solution.

I actually have one more reason for this approach:

It seems that we will need to keep printk_deferred()/WARN_*DEFERRED().
We do not know about a better solution for the deadlocks caused
by scheduler/timekeeping/console_drivers locks.

The pain is that the list of affected locations is hard to maintain.
It would definitely help if such problems are reported by lockdep
in advance. But lockdep is disabled because it creates the deadlock
on its own.

The alternative printk() allows to enable lockdep and effectively
hunt other possible bugs.

Also the printk context per-CPU variable perfectly fits the
lockdep approach. It will allow to monitor in which printk context
all the other locks are taken and detect possible problems.

> can we somehow transparently for the rest of the system (just in
> printk()) detect that we are in a potentially risky situation? hmm,
> I don't know...
> something *very* radical?
> vprintk_func()
> {
> if (this_cpu_read(alt_printk_ctx) & ALT_PRINTK_NMI_CONTEXT_MASK)
> return vprintk_nmi(fmt, args);
> if (in_atomic() ||
> /* ^^^^^^^^^^^^ */
> this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK)
> return vprintk_alt(fmt, args);
> return vprintk_default(fmt, args);
> }

This would affect too many messages. If the error is too serious,
there is a risk that the messages from the alternative per-CPU
buffers will not appear in the main log buffer and the console.
IMHO, this is acceptable for printk-related errors. But people
would complain if other bugs are harder to debug because
the error messages were hidden.

I am going to continue reviewing v2 of the patch set.

BTW: I would like to ask you to slow down a bit. More versions
of such a non-trivial patchset, that are sent within few days,
are far too much. I have some other tasks that I need to work on.
Also I would like to hear opinion from other people. Note that
many people are busy with the merge window at the moment.

Best Regards,