Re: [PATCH 2/2] printk: always report lost messages on serial console

From: Sergey Senozhatsky
Date: Wed Jan 04 2017 - 08:39:24 EST


On (01/04/17 11:52), Petr Mladek wrote:
[..]
> > this is from the real serial logs I'm looking at right now. we attach
> > "bad news" to 'critical' messages only:
> >
> > ...
> > [ 32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
> > ** 150 printk messages dropped ** [ 32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
> > ** 75 printk messages dropped ** [ 32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
> > ** 12 printk messages dropped ** [ 32.941940] ..
> > ** 2 printk messages dropped ** [ 32.941951] ..
> > ** 10 printk messages dropped ** [ 32.941992] ..
> > ** 1 printk messages dropped ** [ 32.941999] ..
> > ...
>
> Do you see how useless the above messages are, please?

what... these lost messages were of extreme importance. I can't tell
the exactly the loglevel, but I'm sure it was at least pr_err() level.
these were like really important messages, unlike the ones that got
suppressed/filtered-out.

along with these lost messages that were supposed to be printed, I have
regions of lost kernel messages (?) with no reports of lost messages from
console_unlock() (!). and that's the only thing I'm fixing here. and the
only thing we can fix. no permutation of console_unlock() lines will make
the buffer bigger or printk flooding CPUs nicer or serial console driver
faster.

once we unlocked the logbuf lock in console_unlock() we lost the race
against the printk() flooding CPUs. all, or some, of the remaining
messages, no matter how important and critical, will be drained. the
next time we lock the logbuf again the logbuf will not be the same.
the only meaningful thing we can print now is "XXX printk messages
dropped". we don't know what was in those messages and no one will ever
find out. it's gone.

and the options here are
"print 1 random message out of XXX or XXXX lost messages"
vs
"print 1 random message out of XXX or XXXX lost messages"

and that's really it. except that in the second case we shuffle
console_unlock() lines with no gain. and I'm not sure I understand
why we even discuss it. both options are absolutely equally terrible,
because the root cause of the problem is in completely different
place.

-ss