Re: [PATCHv6 5/7] printk: report lost messages in printk safe/nmi contexts

From: Sergey Senozhatsky
Date: Fri Dec 23 2016 - 10:09:48 EST


Hello,

On (12/23/16 11:54), Petr Mladek wrote:
[..]
> There is a potential race:
>
> CPU0 CPU1
>
> printk_safe_log_store()
> len = atomic_read(&s->len);
>
> __printk_save_flush()
>
> atomic_cmpxchg(&s->len, len, 0)
>
> report_message_lost(s);
>
> if (len >= sizeof(s->buffer) - 1) {
> atomic_inc(&s->message_lost);
> return 0;
>
> We check the outdated len, account lost message, but it will not
> be reported until some other message appears in the log buffer.
>
> > +
> > out:
>
> I would make sense to move report_message_lost(s) here, after
> the out: label.

hm, ok. to flush from another CPU we first need to have printk-safe/nmi
messages on that CPU, then return from printk-safe/nmi on that CPU, execute
per-CPU irq_wor, and then have concurrent printk-safe/nmi messages on current
CPU, in addition happening frequent enough to hit this case. I may be wrong,
but that's quite unlikely. I can move report_message_lost() to `out' label,
no problem. thanks for the report.

at some point I was actually considering turning ->message_lost into
'bool' -- "we lost your messages, we are sorry". the precise number of
lost messages doesn't help that much: the messages are gone, go and
increment CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT; that's all we can say now.

and speaking of lost messages. I think I found a regression in
console_unlock(). so I'll send out a fix ahead of this series.

and, besides, the logs I had a pleasure to look at today contained numerous
"%d printk messages dropped" with very accurate numbers, but those numbers
meant pretty much nothing to me - the messages were lost.

-ss