Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread

From: Jan Kara
Date: Mon May 29 2017 - 08:12:42 EST


On Mon 29-05-17 11:29:06, Petr Mladek wrote:
> On Wed 2017-05-10 14:59:35, Sergey Senozhatsky wrote:
> > This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
> > sysfs param, which sets the limit on number of lines a process can print
> > from console_unlock(). Value 0 corresponds to the current behavior (no
> > limitation). The printing offloading is happening from console_unlock()
> > function and, briefly, looks as follows: as soon as process prints more
> > than `atomic_print_limit' lines it attempts to offload printing to another
> > process. Since nothing guarantees that there will another process sleeping
> > on the console_sem or calling printk() on another CPU simultaneously, the
> > patch also introduces an auxiliary kernel thread - printk_kthread, the
> > main purpose of which is to take over printing duty. The workflow is, thus,
> > turns into: as soon as process prints more than `atomic_print_limit' lines
> > it wakes up printk_kthread and unlocks the console_sem. So in the best case
> > at this point there will be at least 1 processes trying to lock the
> > console_sem: printk_kthread. (There can also be a process that was sleeping
> > on the console_sem and that was woken up by console semaphore up(); and
> > concurrent printk() invocations from other CPUs). But in the worst case
> > there won't be any processes ready to take over the printing duty: it
> > may take printk_kthread some time to become running; or printk_kthread
> > may even never become running (a misbehaving scheduler, or other critical
> > condition). That's why after we wake_up() printk_kthread we can't
> > immediately leave the printing loop, we must ensure that the console_sem
> > has a new owner before we do so. Therefore, `atomic_print_limit' is a soft
> > limit, not the hard one: we let task to overrun `atomic_print_limit'.
> > But, at the same time, the console_unlock() printing loop behaves differently
> > for tasks that have exceeded `atomic_print_limit': after every printed
> > logbuf entry (call_console_drivers()) such a process wakes up printk_kthread,
> > unlocks the console_sem and attempts to console_trylock() a bit later
> > (if there any are pending messages in the logbuf, of course). In the best case
> > scenario either printk_kthread or some other tasks will lock the console_sem,
> > so current printing task will see failed console_trylock(), which will
> > indicate a successful printing offloading. In the worst case, however,
> > current will successfully console_trylock(), which will indicate that
> > offloading did not take place and we can't return from console_unlock(),
> > so the printing task will print one more line from the logbuf and attempt
> > to offload printing once again; and it will continue doing so until another
> > process locks the console_sem or until there are pending messages in the
> > logbuf. So if everything goes wrong - we can't wakeup printk_kthread and
> > there are no other processes sleeping on the console_sem or trying to down()
> > it - then we will have the existing console_unlock() behavior: print all
> > pending messages in one shot.

Actually I had something very similar in old versions of my patch set. And
it didn't work very well. The problem was that e.g. sometimes scheduler
decided that printk kthread should run on the same CPU as the process
currently doing printing and in such case printk kthread never took over
printing and the machine locked up due to heavy printing.

> First, the real time priority is questionable on its own. Logging
> is important but the real time priority is dangerous. Any "flood"
> of messages will starve all other processes with normal priority.
> It is better than a softlockup but it might cause problems as well.

Processes with real-time priority should have well bounded runtime (in
miliseconds). Printk kthread doesn't have such bounded runtime so it should
not be a real time process as it could hog the CPU it is running on...

So I think what Petr suggests below is better. Keep normal priority, print
something to console from the process doing printk() and just wake up
printk kthread and hope it can print the rest. It is not ideal but unless
there's a flood of messages there is no regression to current state.

> In addition, I think that a semi-sync-async mode might better
> handle the sudden dead. What is it? I have already talked about
> something similar last time.
>
> Normal printk() should do something like:
>
> printk()
> vprintk_emit()
> log_store()
> if (!deferred)
> console_trylock()
> console_unlock()
> while (num_lines < atomic_print_limit)
> handle message
> up_console_sem()
> if (pending_messages)
> wakeup(printk_kthread)
>
>
> and printk_kthread should do:
>
> while() {
>
> if (!pending_messages)
> sleep()
>
> console_lock()
>
> preempt_disable()
> console_unlock()
> while (pending_message && !need_resched)
> handle message
> up_console_sem()
> preempt_enable()
> }
>
>
> By other words:
>
> + printk() should try to get console and handle few lines
> synchronously. It will handle its own message in most cases.
>
> + printk_kthread is only a fallback. It is waken when there are
> unhandled messages. It handles as many messages as
> possible in its assigned time slot. But it releases
> console_sem before it goes into sleep.
>
>
> The advantages:
>
> + all messages are handled synchronously if there is not
> a flood of messages
>
> + if there is the flood of messages; kthread works only
> as a fallback; processes that produce the messages
> are involved into handling the console (nature throttling);
> the messages appear on the console even when printk_kthread
> is sleeping; but the softlockup is prevented
>
> => it should work pretty well in all situations, including the flood
> of messages and sudden dead.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR