Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Fri Apr 07 2017 - 01:13:06 EST


Hello,

On (04/06/17 19:14), Pavel Machek wrote:
[..]
> > @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
> >
> > printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
> >
> > + /*
> > + * Emergency level indicates that the system is unstable and, thus,
> > + * we better stop relying on wake_up(printk_kthread) and try to do
> > + * a direct printing.
> > + */
> > + if (level == LOGLEVEL_EMERG)
> > + printk_kthread_disabled = true;
> > +
> > + set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
>
> Messages lower then _EMERG may be important, too.. and usually are,
> for debugging.
>
> And you keep both code paths, anyway, so they have to work. So you did
> not really "fix" issues you are pointing out -- they still remain
> there for _EMERG and above.

we don't drop messages of lower levels. we just print then from a
schedulable context. once the things go off the rails, and EMERG
is a good hint, I think, we stop being optimismitcs and switch to
a "best effort" mode. that is sort of reasonable. if there is a
flood of EMERG messages that are not actually important and,
basically, are result of a coding error, then, I think, the we
must fix that coding error. I mean, there should be some common
sense, and doing
while (1)
printk(KERN_EMERG "hello\n");
is probably not.


> I agree that printing too much is a problem. Could you just print
> "(messages delayed)" in that case, then wake a kernel thread to [rint
> the rest?

sorry, but what difference would it make?

it's really unclear at what point we should offload printing if we begin
that "we will offload sometimes". for example, I've seen many spin-lock
lockups where printk was involved.

CPU0 CPU1 CPU2 CPU3

spin_lock(&lock) spin_lock(&lock) spin_lock(&lock)
printk("foo") // grabs the console_sem
printk("bar") printk("a")
printk("b")
printk("c")
...
printk("z")
spin_dump() spin_dump()
call_console_drivers() printk() printk()
serial_driver_write() printk() printk()
spin_lock_irqsave(port->lock) ... ...
uart_console_write(...) trigger_all_cpu_backtrace() trigger_all_cpu_backtrace()
serial_driver_putchar()
while (!txrdy(...))
cpu_relax()


spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
additional printk()-s so CPU0 has even more job to do in console_unlock(),
while it still holds the contended spin_lock. and so on; there are
many other examples.

so should we declare a "we can spend only 2 seconds in direct printk()
and then must offload printing" rule? I don't think it's much better
than a simpler "we always offload, as long as we think it's safe".

-ss