Re: Would you help to tell why async printk solution was not taken to upstream kernel ?

From: Sergey Senozhatsky
Date: Mon Mar 05 2018 - 20:52:35 EST


Hello Steven,

Let me Cc Tejun

On (03/05/18 15:58), Steven Rostedt wrote:
> On Mon, 5 Mar 2018 11:14:16 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx> wrote:
>
> > But I still think that it makes sense to change that "print it all" approach.
> > With more clear/explicit watchdog-dependent limits - we do direct printk for
> > 1/2 (or 2/3) of a current watchdog threshold value and offload if there is
> > more stuff in the logbuf. Implicit "logbuf size * console throughput" is
> > harder to understand. Disabling watchdog because of printk is a bit too much
> > of a compromise, probably.
>
> If you know the baud rate, logbuf size * console throughput is actually
> trivial to calculate.
>
> Let's see. CONFIG_LOG_BUF_SHIFT defaults to 18 (2^18 = 262144).
> Lets say we have a slow 9600 baud serial, which would give us:
>
> 262144 * 8 / 9600 = 219 (rounded up).
>
> Thus, the worse case scenario would be 219 seconds to output the entire
> buffer. Add 10 seconds more for extra overhead, and then you have 229
> second watchdog that should never trigger because of a very slow
> console.
>
> (A more common 151200 baud modem would empty the buffer in 14 seconds).

Right. And when you register one more console (e.g. net console), you need
to re-calculate and re-adjust watchdog. When you set kernel log_buf_len
param (e.g. you might do log_buf_len=32G to store ftrace dumps from NMI)
you need to re-calculate and re-adjust watchdog, etc.

> > IOW, is logbuf worth of messages so critically important after all that we
> > are ready to jeopardize the system stability?
>
> The stability is only in jeopardy if the watchdogs trigger, right?

Not limited to, watchdog threshold is at least deterministic.
Unlike, for instance, this guy

rcu_read_lock()
printk()
rcu_read_unlock()

It will block RCU grace periods. In the worst case this can become a
full-blown RCU stall and even OOM. In a less dramatic case this can
increase memory pressure, cause reclaimer activities, etc, which is not
a very good development, whether you have a small embedded device or a
server under high load, especially given that all you did was a bunch
of printks.

-ss