Re: [PATCH 2/2] printk: always report lost messages on serial console
From: Petr Mladek
Date: Wed Jan 04 2017 - 05:53:03 EST
On Wed 2017-01-04 11:46:50, Sergey Senozhatsky wrote:
> On (01/03/17 17:53), Petr Mladek wrote:
> > On Wed 2017-01-04 00:47:45, Sergey Senozhatsky wrote:
> > > On (01/03/17 15:55), Petr Mladek wrote:
> > > [..]
> > > > This causes the opposite problem. We might print a message that was supposed
> > > > to be suppressed.
> > >
> > > so what? yes, we print a message that otherwise would have been suppressed.
> > > not a big deal. at all. we are under high printk load and the best thing
> > > we can do is to report "we are losing the messages" straight ahead. the
> > > next 'visible' message may be seconds/minutes/forever away. think of a
> > > printk() flood of messages with suppressed loglevel coming from CPUA-CPUX,
> > > big enough to drain all 'visible' loglevel messages from CPUZ. we are
> > > back to problem "a".
> > >
> > > thus I want a simple bool flag and a simple rule: we see something - we say it.
> >
> > So, you prefer to print some random debug message instead of an
> > emergency one?
>
> yep. because console_unlock() is not the right place to address printk()
> abuse, and console_unlock() cannot address it. the only way to fix it is
> to reduce the amount of printk() calls (well, there is one more thing
> probably and may be but not really *).
Yes and no. Please note that writing into logbug is fast while
writing to the serial console is slow. This is why we have
console_level. It allows to filter only the critical messages
for the slow output.
> > It will always drop a message because you always process only one
> > and many new appear in the meantime. While with my solution,
> > you should see:
> >
> > ** 1324 printk messages dropped ** <alert: random message>
> > ** 523 printk messages dropped ** <emerg: random message>
> > ** 324 printk messages dropped ** <emerg: random message>
> > ** 345 printk messages dropped ** <alert: random message>
>
> once you started losing the messages because of printk() flood you will
> lose them no matter what you do. console_unlock() cannot force printk() CPUs
> to add less messages to the logbuf, so we magically can have enough time to
> print the message on the consoles. any call to console drivers leads to lost
> messages. there is no difference.
>
> this is from the real serial logs I'm looking at right now. we attach
> "bad news" to 'critical' messages only:
>
> ...
> [ 32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
> ** 150 printk messages dropped ** [ 32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
> ** 75 printk messages dropped ** [ 32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
> ** 12 printk messages dropped ** [ 32.941940] ..
> ** 2 printk messages dropped ** [ 32.941951] ..
> ** 10 printk messages dropped ** [ 32.941992] ..
> ** 1 printk messages dropped ** [ 32.941999] ..
> ...
Do you see how useless the above messages are, please?
Are they really printed with KERN_EMERG or KERN_ALERT prefix?
IMHO, the point of the log levels and console_level is
to have a chance to see the more informative messages
on the slow medium (serial console).
BTW: It is questionable if messages with LOGLEVEL_DEFAULT are
always printed but this is another story.
> note how many critical messages I lost in consecutive console drivers calls
> because console_unlock() was printing other critical messages.
> no matter what we filter-out in console_unlock() we are still far-far-far
> behind the CPUs that flood the printk buffer. and those CPUs will be happy
> to drain any critical/visible message from the logbuf.
>
> * may be can do something like this:
>
> two logbuf buffers. one for `everything' -- both suppressed and visible
> loglevels. this one is also used by dmesg. the other one if for messages
> that won't be suppressed. we call_console_drivers() on that buffer.
> so vprintk_emit() becomes
>
> vprintk_emit()
> {
> spin_lock logbuf_lock
>
> text = sprintf(...)
>
> log_store(logbuf)
> if (!suppress_message_printing(level))
> log_store(printing_lofbuf)
>
> spin_ulock logbuf_lock
> }
>
> and console_unlock() reads printing_logbuf only. but we still can lose
> messages even from filtered printing_logbuf.
Please, do not do this. IMHO, it will not improve the situation much.
Let me repeat. Writing into logbuf and filtering messages is
relatively fast. The slow thing is printing to the serial console.
Also writing and filtering is done under logbuf_lock
which blocks other writers. This is a natural throttling of
other writers. On the other hand, the console handling is done
without logbuf_lock and other writers might do anything in the
meantime.
It might make sense to allow filtering messages that are
stored to the logbuffer. But this is orthogonal to the
console output filtering.
Also we should improve the API to make it easier to throotle
the same messages. Or allow to throotle all messages.
Anyway, your patch breaks console output filtering. This is
why I am against it and propose another solution.
Best Regards,
Petr